Introduction to the Theory of Computation Some Notes for CIS262

Size: px

Start display at page:

Download "Introduction to the Theory of Computation Some Notes for CIS262"

Shannon Fowler
6 years ago
Views:

1 Introduction to the Theory of Computtion Some Notes for CIS262 Jen Gllier Deprtment of Computer nd Informtion Science University of Pennsylvni Phildelphi, PA 19104, USA e-mil: c Jen Gllier Plese, do not reproduce without permission of the uthor December 26, 2017

2 2

3 Contents 1 Introduction 7 2 Bsics of Forml Lnguge Theory Alphbets, Strings, Lnguges Opertions on Lnguges DFA s, NFA s, Regulr Lnguges Deterministic Finite Automt (DFA s) The Cross-product Construction Nondeteterministic Finite Automt (NFA s) ǫ-closure Converting n NFA into DFA Finite Stte Automt With Output: Trnsducers An Appliction of NFA s: Text Serch Hidden Mrkov Models (HMMs) Hidden Mrkov Models (HMMs) The Viterbi Algorithm nd the Forwrd Algorithm Regulr Lnguges, Minimiztion of DFA s Directed Grphs nd Pths Lbeled Grphs nd Automt The Closure Definition of the Regulr Lnguges Regulr Expressions Regulr Expressions nd Regulr Lnguges Regulr Expressions nd NFA s Applictions of Regulr Expressions Summry of Closure Properties of the Regulr Lnguges Right-Invrint Equivlence Reltions on Σ Finding miniml DFA s Stte Equivlence nd Miniml DFA s The Pumping Lemm A Fst Algorithm for Checking Stte Equivlence

4 4 CONTENTS 6 Context-Free Grmmrs And Lnguges Context-Free Grmmrs Derivtions nd Context-Free Lnguges Norml Forms for Context-Free Grmmrs Regulr Lnguges re Context-Free Useless Productions in Context-Free Grmmrs The Greibch Norml Form Lest Fixed-Points Context-Free Lnguges s Lest Fixed-Points Lest Fixed-Points nd the Greibch Norml Form Tree Domins nd Gorn Trees Derivtions Trees Ogden s Lemm Pushdown Automt From Context-Free Grmmrs To PDA s From PDA s To Context-Free Grmmrs The Chomsky-Schutzenberger Theorem A Survey of LR-Prsing Methods LR(0)-Chrcteristic Automt Shift/Reduce Prsers Computtion of FIRST The Intuition Behind the Shift/Reduce Algorithm The Grph Method for Computing Fixed Points Computtion of FOLLOW Algorithm Trverse More on LR(0)-Chrcteristic Automt LALR(1)-Lookhed Sets Computing FIRST, FOLLOW, etc. in the Presence of ǫ-rules LR(1)-Chrcteristic Automt RAM Progrms, Turing Mchines Prtil Functions nd RAM Progrms Definition of Turing Mchine Computtions of Turing Mchines RAM-computble functions re Turing-computble Turing-computble functions re RAM-computble Computbly Enumerble nd Computble Lnguges The Primitive Recursive Functions The Prtil Computble Functions Universl RAM Progrms nd the Hlting Problem Piring Functions

5 CONTENTS Equivlence of Alphbets Coding of RAM Progrms Kleene s T-Predicte A Simple Function Not Known to be Computble A Non-Computble Function; Busy Bevers Elementry Recursive Function Theory Acceptble Indexings Undecidble Problems Listble (Recursively Enumerble) Sets Reducibility nd Complete Sets The Recursion Theorem Extended Rice Theorem Cretive nd Productive Sets Listble nd Diophntine Sets; Hilbert s Tenth Diophntine Equtions; Hilbert s Tenth Problem Diophntine Sets nd Listble Sets Some Applictions of the DPRM Theorem The Post Correspondence Problem; Applictions The Post Correspondence Problem Some Undecidbility Results for CFG s More Undecidble Properties of Lnguges Computtionl Complexity; P nd N P The Clss P Directed Grphs, Pths Eulerin Cycles Hmiltonin Cycles Propositionl Logic nd Stisfibility The Clss NP, NP-Completeness The Cook-Levin Theorem Some N P-Complete Problems Sttements of the Problems Proofs of NP-Completeness Succinct Certifictes, conp, nd EXP Primlity Testing is in NP Prime Numbers nd Composite Numbers Methods for Primlity Testing Modulr Arithmetic, the Groups Z/nZ, (Z/nZ)

6 6 CONTENTS 15.4 The Lucs Theorem; Lucs Trees Algorithms for Computing Powers Modulo m PRIMES is in NP

7 Chpter 1 Introduction The theory of computtion is concerned with lgorithms nd lgorithmic systems: their design nd representtion, their completeness, nd their complexity. The purpose of these notes is to introduce some of the bsic notions of the theory of computtion, including concepts from forml lnguges nd utomt theory, the theory of computbility, some bsics of recursive function theory, nd n introduction to complexity theory. Other topics such s correctness of progrms will not be treted here (there just isn t enough time!). The notes re divided into three prts. The first prt is devoted to forml lnguges nd utomt. The second prt dels with models of computtion, recursive functions, nd undecidbility. The third prt dels with computtionl complexity, in prticulr the clsses P nd NP. 7

8 8 CHAPTER 1. INTRODUCTION

9 Chpter 2 Bsics of Forml Lnguge Theory 2.1 Alphbets, Strings, Lnguges Our view of lnguges is tht lnguge is set of strings. In turn, string is finite sequence of letters from some lphbet. These concepts re defined rigorously s follows. Definition 2.1. An lphbet Σ is ny finite set. We often write Σ = { 1,..., k }. The i re clled the symbols of the lphbet. Exmples: Σ = {} Σ = {,b,c} Σ = {0,1} Σ = {α,β,γ,δ,ǫ,λ,ϕ,ψ,ω,µ,ν,ρ,σ,η,ξ,ζ} A string is finite sequence of symbols. Techniclly, it is convenient to define strings s functions. For ny integer n 1, let nd for n = 0, let [n] = {1,2,...,n}, [0] =. Definition 2.2. Given n lphbet Σ, string over Σ (or simply string) of length n is ny function u: [n] Σ. The integer n is the length of the string u, nd it is denoted s u. When n = 0, the specil string u: [0] Σ of length 0 is clled the empty string, or null string, nd is denoted s ǫ. 9

10 10 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY Given string u: [n] Σ of length n 1, u(i) is the i-th letter in the string u. For simplicity of nottion, we denote the string u s with ech u i Σ. u = u 1 u 2...u n, For exmple, if Σ = {,b} nd u: [3] Σ is defined such tht u(1) =, u(2) = b, nd u(3) =, we write u = b. Other exmples of strings re work, f un, gbuzomeuh Strings of length 1 re functions u: [1] Σ simply picking some element u(1) = i in Σ. Thus, we will identify every symbol i Σ with the corresponding string of length 1. The set of ll strings over n lphbet Σ, including the empty string, is denoted s Σ. Observe tht when Σ =, then = {ǫ}. When Σ, the set Σ is countbly infinite. Lter on, we will see wys of ordering nd enumerting strings. Strings cn be juxtposed, or conctented. Definition 2.3. Given n lphbet Σ, given ny two strings u: [m] Σ nd v: [n] Σ, the conctention u v (lso written uv) of u nd v is the string uv: [m+n] Σ, defined such tht { u(i) if 1 i m, uv(i) = v(i m) if m+1 i m+n. In prticulr, uǫ = ǫu = u. Observe tht uv = u + v. For exmple, if u = g, nd v = buzo, then uv = gbuzo It is immeditely verified tht u(vw) = (uv)w.

11 2.1. ALPHABETS, STRINGS, LANGUAGES 11 Thus, conctention is binry opertion on Σ which is ssocitive nd hs ǫ s n identity. Note tht generlly, uv vu, for exmple for u = nd v = b. Given string u Σ nd n 0, we define u n recursively s follows: u 0 = ǫ u n+1 = u n u (n 0). Clerly, u 1 = u, nd it is n esy exercise to show tht u n u = uu n, for ll n 0. For the induction step, we hve u n+1 u = (u n u)u by definition of u n+1 = (uu n )u by the induction hypothesis = u(u n u) by ssocitivity = uu n+1 by definition of u n+1. Definition 2.4. GivennlphbetΣ, given nytwo stringsu,v Σ wedefinethefollowing notions s follows: u is prefix of v iff there is some y Σ such tht v = uy. u is suffix of v iff there is some x Σ such tht v = xu. u is substring of v iff there re some x,y Σ such tht v = xuy. We sy tht u is proper prefix (suffix, substring) of v iff u is prefix (suffix, substring) of v nd u v. For exmple, g is prefix of gbuzo, zo is suffix of gbuzo nd buz is substring of gbuzo. Recll tht prtil ordering on set S is binry reltion S S which is reflexive, trnsitive, nd ntisymmetric. The concepts of prefix, suffix, nd substring, define binry reltions on Σ in the obvious wy. It cn be shown tht these reltions re prtil orderings. Another importnt ordering on strings is the lexicogrphic (or dictionry) ordering.

12 12 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY Definition 2.5. Given n lphbet Σ = { 1,..., k } ssumed totlly ordered such tht 1 < 2 < < k, given ny two strings u,v Σ, we define the lexicogrphic ordering s follows: u v (1) if v = uy, for some y Σ, or (2) if u = x i y, v = x j z, i < j, with i, j Σ, nd for some x,y,z Σ. Note tht cses (1) nd (2) re mutully exclusive. In cse (1) u is prefix of v. In cse (2) v u nd u v. For exmple b b, gllhger gllier. It is firly tedious to prove tht the lexicogrphic ordering is in fct prtil ordering. In fct, it is totl ordering, which mens tht for ny two strings u,v Σ, either u v, or v u. The reversl w R of string w is defined inductively s follows: where Σ nd u Σ. ǫ R = ǫ, (u) R = u R, For exmple It cn be shown tht Thus, nd when u i Σ, we hve reillg = gllier R. (uv) R = v R u R. (u 1...u n ) R = u R n...u R 1, (u 1...u n ) R = u n...u 1. We cn now define lnguges. Definition 2.6. Given n lphbet Σ, lnguge over Σ (or simply lnguge) is ny subset L of Σ.

13 2.1. ALPHABETS, STRINGS, LANGUAGES 13 If Σ, there re uncountbly mny lnguges. A Quick Review of Finite, Infinite, Countble, nd Uncountble Sets For detils nd proofs, see Discrete Mthemtics, by Gllier. Let N = {0,1,2,...} be the set of nturl numbers. Recll thtset X isfinite ifthere is somenturl number n Nndbijection between X nd the set [n] = {1,2,...,n}. (When n = 0, X =, the empty set.) The number n is uniquely determined. It is clled the crdinlity (or size) of X nd is denoted by X. A set is infinite iff it is not finite. Recll tht ny injection or surjection of finite set to itself is in fct bijection. The bove fils for infinite sets. The pigeonhole principle sserts tht there is no bijection between finite set X nd ny proper subset Y of X. Consequence: If we think of X s set of n pigeons nd if there re only m < n boxes (corresponding to the elements of Y), then t lest two of the pigeons must shre the sme box. As consequence of the pigeonhole principle, set X is infinite iff it is in bijection with proper subset of itself. For exmple, we hve bijection n 2n between N nd the set 2N of even nturl numbers, proper subset of N, so N is infinite. N. A set X is countble (or denumerble) if there is n injection from X into N. If X is not the empty set, then X is countble iff there is surjection from N onto X. It cn be shown tht set X is countble if either it is finite or if it is in bijection with We will see lter tht N N is countble. As consequence, the set Qof rtionl numbers is countble. A set is uncountble if it is not countble. For exmple, R (the set of rel numbers) is uncountble. Similrly (0,1) = {x R 0 < x < 1} is uncountble. However, there is bijection between (0,1) nd R (find one!) The set 2 N of ll subsets of N is uncountble.

14 14 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY If Σ, then the set Σ of ll strings over Σ is infinite nd countble. Suppose Σ = k with Σ = { 1,..., k }. If k = 1 write = 1, nd then {} = {ǫ,,,,..., n,...}. We hve the bijection n n from N to {}. If k 2, then we cn think of the string u = i1 in s representtion of the integer ν(u) in bse k shifted by (k n 1)/(k 1), ν(u) = i 1 k n 1 +i 2 k n 2 + +i n 1 k +i n (with ν(ǫ) = 0). = kn 1 k 1 +(i 1 1)k n 1 + +(i n 1 1)k +i n 1. We leve it s n exercise to show tht ν: Σ N is bijection. In fct, ν correspond to the enumertion of Σ where u precedes v if u < v, nd u precedes v in the lexicogrphic ordering if u = v. For exmple, if k = 2 nd if we write Σ = {,b}, then the enumertion begins with ǫ,, b,, b, b, bb. On the other hnd, if Σ, the set 2 Σ of ll subsets of Σ (ll lnguges) is uncountble. Indeed, we cn show tht there is no surjection from N onto 2 Σ. First, we show tht there is no surjection from Σ onto 2 Σ. We clim tht if there is no surjection from Σ onto 2 Σ, then there is no surjection from N onto 2 Σ either. Assume by contrdiction tht there is surjection g: N 2 Σ. But, if Σ, then Σ is infinite nd countble, thus we hve the bijection ν: Σ N. Then the composition Σ ν N g 2 Σ is surjection, becuse the bijection ν is surjection, g is surjection, nd the composition of surjections is surjection, contrdicting the hypothesis tht there is no surjection from Σ onto 2 Σ. To prove tht tht there is no surjection Σ onto 2 Σ. We use digonliztionrgument. This is n instnce of Cntor s Theorem.

15 2.2. OPERATIONS ON LANGUAGES 15 Theorem 2.1. (Cntor) There is no surjection from Σ onto 2 Σ. Proof. Assume there is surjection h: Σ 2 Σ, nd consider the set D = {u Σ u / h(u)}. By definition, for ny u we hve u D iff u / h(u). Since h is surjective, there is some w Σ such tht h(w) = D. Then, since by definition of D nd since D = h(w), we hve w D iff w / h(w) = D, contrdiction. Therefore g is not surjective. Therefore, if Σ, then 2 Σ is uncountble. We will try to single out countble trctble fmilies of lnguges. We will begin with the fmily of regulr lnguges, nd then proceed to the context-free lnguges. We now turn to opertions on lnguges. 2.2 Opertions on Lnguges A wy of building more complex lnguges from simpler ones is to combine them using vrious opertions. First, we review the set-theoretic opertions of union, intersection, nd complementtion. Given some lphbet Σ, for ny two lnguges L 1,L 2 over Σ, the union L 1 L 2 of L 1 nd L 2 is the lnguge L 1 L 2 = {w Σ w L 1 or w L 2 }. The intersection L 1 L 2 of L 1 nd L 2 is the lnguge L 1 L 2 = {w Σ w L 1 nd w L 2 }. The difference L 1 L 2 of L 1 nd L 2 is the lnguge L 1 L 2 = {w Σ w L 1 nd w / L 2 }. The difference is lso clled the reltive complement. A specil cse of the difference is obtined when L 1 = Σ, in which cse we define the complement L of lnguge L s L = {w Σ w / L}. The bove opertions do not use the structure of strings. The following opertions use conctention.

16 16 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY Definition 2.7. GivennlphbetΣ, fornytwolngugesl 1,L 2 overσ, theconctention L 1 L 2 of L 1 nd L 2 is the lnguge L 1 L 2 = {w Σ u L 1, v L 2, w = uv}. For ny lnguge L, we define L n s follows: L 0 = {ǫ}, L n+1 = L n L (n 0). The following properties re esily verified: L =, L =, L{ǫ} = L, {ǫ}l = L, (L 1 {ǫ})l 2 = L 1 L 2 L 2, L 1 (L 2 {ǫ}) = L 1 L 2 L 1, L n L = LL n. In generl, L 1 L 2 L 2 L 1. Sofr,theopertionsthtwehveintroduced, except complementtion(sincel = Σ L is infinite if L is finite nd Σ is nonempty), preserve the finiteness of lnguges. This is not the cse for the next two opertions. Definition 2.8. Given n lphbet Σ, for ny lnguge L over Σ, the Kleene -closure L of L is the lnguge L = n 0L n. The Kleene +-closure L + of L is the lnguge L + = n 1L n. Thus, L is the infinite union L = L 0 L 1 L 2... L n...,

17 2.2. OPERATIONS ON LANGUAGES 17 nd L + is the infinite union L + = L 1 L 2... L n... Since L 1 = L, both L nd L + contin L. In fct, L + = {w Σ, n 1, u 1 L u n L, w = u 1 u n }, nd since L 0 = {ǫ}, L = {ǫ} {w Σ, n 1, u 1 L u n L, w = u 1 u n }. Thus, the lnguge L lwys contins ǫ, nd we hve L = L + {ǫ}. However, if ǫ / L, then ǫ / L +. The following is esily shown: = {ǫ}, L + = L L, L = L, L L = L. The Kleene closures hve mny other interesting properties. Homomorphisms re lso very useful. Given two lphbets Σ,, homomorphism h: Σ between Σ nd is function h: Σ such tht Letting u = v = ǫ, we get which implies tht (why?) h(uv) = h(u)h(v) for ll u,v Σ. h(ǫ) = h(ǫ)h(ǫ),

18 18 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY h(ǫ) = ǫ. If Σ = { 1,..., k }, it is esily seen tht h is completely determined by h( 1 ),...,h( k ) (why?) Exmple: Σ = {,b,c}, = {0,1}, nd For exmple h() = 01, h(b) = 011, h(c) = h(bbc) = Given ny lnguge L 1 Σ, we define the imge h(l 1 ) of L 1 s h(l 1 ) = {h(u) u L 1 }. Given ny lnguge L 2, we define the inverse imge h 1 (L 2 ) of L 2 s h 1 (L 2 ) = {u Σ h(u) L 2 }. We now turn to the first formlism for defining lnguges, Deterministic Finite Automt (DFA s)

19 Chpter 3 DFA s, NFA s, Regulr Lnguges The fmily of regulr lnguges is the simplest, yet interesting fmily of lnguges. We give six definitions of the regulr lnguges. 1. Using deterministic finite utomt (DFAs). 2. Using nondeterministic finite utomt (NFAs). 3. Using closure definition involving, union, conctention, nd Kleene. 4. Using regulr expressions. 5. Using right-invrint equivlence reltions of finite index (the Myhill-Nerode chrcteriztion). 6. Using right-liner context-free grmmrs. We prove the equivlence of these definitions, often by providing n lgorithm for converting one formultion into nother. We find tht the introduction of NFA s is motivted by the conversion of regulr expressions into DFA s. To finish this conversion, we lso show tht every NFA cn be converted into DFA (using the subset construction). So, lthough NFA s often llow for more concise descriptions, they do not hve more expressive power thn DFA s. NFA s operte ccording to the prdigm: guess successful pth, nd check it in polynomil time. This is the essence of n importnt clss of hrd problems known s NP, which will be investigted lter. 19

20 20 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES We will lso discuss methods for proving tht certin lnguges re not regulr (Myhill- Nerode, pumping lemm). We present lgorithms to convert DFA to n equivlent one with miniml number of sttes. 3.1 Deterministic Finite Automt (DFA s) First we define wht DFA s re, nd then we explin how they re used to ccept or reject strings. Roughly speking, DFA is finite trnsition grph whose edges re lbeled with letters from n lphbet Σ. The grph lso stisfies certin properties tht mke it deterministic. Bsiclly, this mens tht given ny string w, strting from ny node, there is unique pth in the grph prsing the string w. Exmple 1. A DFA for the lnguge L 1 = {b} + = {b} {b}, i.e., L 1 = {b,bb,bbb,...,(b) n,...}. Input lphbet: Σ = {,b}. Stte set Q 1 = {0,1,2,3}. Strt stte: 0. Set of ccepting sttes: F 1 = {2}. Trnsition tble (function) δ 1 : b Note tht stte 3 is trp stte or ded stte. Here is grph representtion of the DFA specified by the trnsition function shown bove:

21 3.1. DETERMINISTIC FINITE AUTOMATA (DFA S) 21 0 b 1 2 b b 3,b Figure 3.1: DFA for {b} + i.e., Exmple 2. A DFA for the lnguge Input lphbet: Σ = {,b}. Stte set Q 2 = {0,1,2}. Strt stte: 0. Set of ccepting sttes: F 2 = {0}. Trnsition tble (function) δ 2 : Stte 2 is trp stte or ded stte. L 2 = {b} = L 1 {ǫ} L 2 = {ǫ,b,bb,bbb,...,(b) n,...}. b Here is grph representtion of the DFA specified by the trnsition function shown bove: 0 1 b b 2,b Figure 3.2: DFA for {b}

22 22 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Exmple 3. A DFA for the lnguge L 3 = {,b} {bb}. Note tht L 3 consists of ll strings of s nd b s ending in bb. Input lphbet: Σ = {,b}. Stte set Q 3 = {0,1,2,3}. Strt stte: 0. Set of ccepting sttes: F 3 = {3}. Trnsition tble (function) δ 3 : b Here is grph representtion of the DFA specified by the trnsition function shown bove: b b 0 b b Figure 3.3: DFA for {,b} {bb} Is this miniml DFA? Definition 3.1. A deterministic finite utomton (or DFA) is quintuple D = (Q,Σ,δ,q 0,F), where Σ is finite input lphbet; Q is finite set of sttes;

23 3.1. DETERMINISTIC FINITE AUTOMATA (DFA S) 23 F is subset of Q of finl (or ccepting) sttes; q 0 Q is the strt stte (or initil stte); δ is the trnsition function, function δ: Q Σ Q. For ny stte p Q nd ny input Σ, the stte q = δ(p,) is uniquely determined. Thus, it is possible to define the stte reched from given stte p Q on input w Σ, following the pth specified by w. Techniclly, this is done by defining the extended trnsition function δ : Q Σ Q. Definition 3.2. Given DFA D = (Q,Σ,δ,q 0,F), the extended trnsition function δ : Q Σ Q is defined s follows: δ (p,ǫ) = p, δ (p,u) = δ(δ (p,u),), where Σ nd u Σ. It is immedite tht δ (p,) = δ(p,) for Σ. The mening of δ (p,w) is tht it is the stte reched from stte p following the pth from p specified by w. We cn show (by induction on the length of v) tht δ (p,uv) = δ (δ (p,u),v) for ll p Q nd ll u,v Σ For the induction step, for u Σ, nd ll v = y with y Σ nd Σ, δ (p,uy) = δ(δ (p,uy),) by definition of δ = δ(δ (δ (p,u),y),) by induction = δ (δ (p,u),y) by definition of δ. We cn now define how DFA ccepts or rejects string. Definition 3.3. Given DFA D = (Q,Σ,δ,q 0,F), the lnguge L(D) ccepted (or recognized) by D is the lnguge L(D) = {w Σ δ (q 0,w) F}.

24 24 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Thus, string w Σ is ccepted iff the pth from q 0 on input w ends in finl stte. The definition of DFA does not prevent the possibility tht DFA my hve sttes tht re not rechble from the strt stte q 0, which mens tht there is no pth from q 0 to such sttes. For exmple, in the DFA D 1 defined by the trnsition tble below nd the set of finl sttes F = {1,2,3}, the sttes in the set {0,1} re rechble from the strt stte 0, but the sttes in the set {2,3,4} re not (even though there re trnsitions from 2,3,4 to 0, but they go in the wrong direction). b Since there is no pth from the strt stte 0 to ny of the sttes in {2,3,4}, the sttes 2,3,4 re useless s fr s cceptnce of strings, so they should be deleted s well s the trnsitions from them. Given DFA D = (Q,Σ,δ,q 0,F), the bove suggests defining the set Q r of rechble (or ccessible) sttes s Q r = {p Q ( u Σ )(p = δ (q 0,u))}. The set Q r consists of those sttes p Q such tht there is some pth from q 0 to p (long some string u). Computing the set Q r is rechbility problem in directed grph. There re vrious lgorithms to solve this problem, including bredth-first serch or depth-first serch. OncethesetQ r hsbeencomputed, wecnclenupthedfad bydeletingllredundnt sttes in Q Q r nd ll trnsitions from these sttes. More precisely, we form the DFA D r = (Q r,σ,δ r,q 0,Q r F), where δ r : Q r Σ Q r is the restriction of δ: Q Σ Q to Q r. If D 1 is the DFA of the previous exmple, then the DFA (D 1 ) r is obtined by deleting the sttes 2,3,4: b It cn be shown tht L(D r ) = L(D) (see the homework problems).

25 3.2. THE CROSS-PRODUCT CONSTRUCTION 25 A DFA D such tht Q = Q r is sid to be trim (or reduced). Observe tht the DFA D r is trim. A miniml DFA must be trim. Computing Q r gives us method to test whether DFAD ccepts nonempty lnguge. Indeed L(D) iff Q r F We now come to the first of severl equivlent definitions of the regulr lnguges. Regulr Lnguges, Version 1 Definition 3.4. A lnguge L is regulr lnguge if it is ccepted by some DFA. Note tht regulr lnguge my be ccepted by mny different DFAs. Lter on, we will investigte how to find miniml DFA s. ForgivenregulrlngugeL, minimldfaforlisdfawiththesmllestnumberof sttes mong ll DFA s ccepting L. A miniml DFA for L must exist since every nonempty subset of nturl numbers hs smllest element. In order to understnd how complex the regulr lnguges re, we will investigte the closure properties of the regulr lnguges under union, intersection, complementtion, conctention, nd Kleene. It turns out tht the fmily of regulr lnguges is closed under ll these opertions. For union, intersection, nd complementtion, we cn use the cross-product construction which preserves determinism. However, for conctention nd Kleene, there does not pper to be ny method involving DFA s only. The wy to do it is to introduce nondeterministic finite utomt (NFA s), which we do little lter. 3.2 The Cross-product Construction Let Σ = { 1,..., m } be n lphbet. Given ny two DFA s D 1 = (Q 1,Σ,δ 1,q 0,1,F 1 ) nd D 2 = (Q 2,Σ,δ 2,q 0,2,F 2 ), there is very useful construction for showing tht the union, the intersection, or the reltive complement of regulr lnguges, is regulr lnguge. Given ny two lnguges L 1,L 2 over Σ, recll tht L 1 L 2 = {w Σ w L 1 or w L 2 }, L 1 L 2 = {w Σ w L 1 nd w L 2 }, L 1 L 2 = {w Σ w L 1 nd w / L 2 }.

26 26 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Let us first explin how to constuct DFA ccepting the intersection L 1 L 2. Let D 1 nd D 2 be DFA s such tht L 1 = L(D 1 ) nd L 2 = L(D 2 ). The ide is to construct DFA simulting D 1 nd D 2 in prllel. This cn be done by using sttes which re pirs (p 1,p 2 ) Q 1 Q 2. Thus, we define the DFA D s follows: D = (Q 1 Q 2,Σ,δ,(q 0,1,q 0,2 ),F 1 F 2 ), where the trnsition function δ: (Q 1 Q 2 ) Σ Q 1 Q 2 is defined s follows: for ll p 1 Q 1, p 2 Q 2, nd Σ. δ((p 1,p 2 ), ) = (δ 1 (p 1,), δ 2 (p 2,)), Clerly, D is DFA, since D 1 nd D 2 re. Also, by the definition of δ, we hve for ll p 1 Q 1, p 2 Q 2, nd w Σ. Now, we hve w L(D 1 ) L(D 2 ) Thus, L(D) = L(D 1 ) L(D 2 ). δ ((p 1,p 2 ), w) = (δ 1 (p 1,w), δ 2 (p 2,w)), iff w L(D 1 ) nd w L(D 2 ), iff δ 1(q 0,1,w) F 1 nd δ 2(q 0,2,w) F 2, iff (δ 1 (q 0,1,w),δ 2 (q 0,2,w)) F 1 F 2, iff δ ((q 0,1,q 0,2 ), w) F 1 F 2, iff w L(D). We cn now modify D very esily to ccept L(D 1 ) L(D 2 ). We chnge the set of finl sttes so tht it becomes (F 1 Q 2 ) (Q 1 F 2 ). Indeed, w L(D 1 ) L(D 2 ) iff w L(D 1 ) or w L(D 2 ), iff δ 1 (q 0,1,w) F 1 or δ 2 (q 0,2,w) F 2, iff (δ 1 (q 0,1,w),δ 2 (q 0,2,w)) (F 1 Q 2 ) (Q 1 F 2 ), iff δ ((q 0,1,q 0,2 ), w) (F 1 Q 2 ) (Q 1 F 2 ), iff w L(D). Thus, L(D) = L(D 1 ) L(D 2 ). We cn lso modify D very esily to ccept L(D 1 ) L(D 2 ).

27 3.3. NONDETETERMINISTIC FINITE AUTOMATA (NFA S) 27 We chnge the set of finl sttes so tht it becomes F 1 (Q 2 F 2 ). Indeed, w L(D 1 ) L(D 2 ) iff w L(D 1 ) nd w / L(D 2 ), iff δ1 (q 0,1,w) F 1 nd δ2 (q 0,2,w) / F 2, iff (δ1(q 0,1,w),δ2(q 0,2,w)) F 1 (Q 2 F 2 ), iff δ ((q 0,1,q 0,2 ), w) F 1 (Q 2 F 2 ), iff w L(D). Thus, L(D) = L(D 1 ) L(D 2 ). In ll cses, if D 1 hs n 1 sttes nd D 2 hs n 2 sttes, the DFA D hs n 1 n 2 sttes. 3.3 Nondeteterministic Finite Automt (NFA s) NFA s re obtined from DFA s by llowing multiple trnsitions from given stte on given input. This cn be done by defining δ(p,) s subset of Q rther thn single stte. It will lso be convenient to llow trnsitions on input ǫ. We let 2 Q denote the set of ll subsets of Q, including the empty set. The set 2 Q is the power set of Q. Exmple 4. A NFA for the lnguge Input lphbet: Σ = {,b}. Stte set Q 4 = {0,1,2,3}. Strt stte: 0. Set of ccepting sttes: F 4 = {3}. Trnsition tble δ 4 : L 3 = {,b} {bb}. b 0 {0,1} {0} 1 {2} 2 {3} 3,b b b Figure 3.4: NFA for {,b} {bb}

28 28 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Exmple 5. Let Σ = { 1,..., n }, let L i n = {w Σ w contins n odd number of i s}, nd let L n = L 1 n L 2 n L n n. The lnguge L n consists of those strings in Σ tht contin n odd number of some letter i Σ. Equivlently Σ L n consists of those strings in Σ with n even number of every letter i Σ. It cn be shown tht every DFA ccepting L n hs t lest 2 n sttes. However, there is n NFA with 2n+1 sttes ccepting L n. We define NFA s s follows. Definition 3.5. A nondeterministic finite utomton (or NFA) is quintuple N = (Q,Σ,δ,q 0,F), where Σ is finite input lphbet; Q is finite set of sttes; F is subset of Q of finl (or ccepting) sttes; q 0 Q is the strt stte (or initil stte); δ is the trnsition function, function δ: Q (Σ {ǫ}) 2 Q. For ny stte p Q nd ny input Σ {ǫ}, the set of sttes δ(p,) is uniquely determined. We write q δ(p,). Given n NFA N = (Q,Σ,δ,q 0,F), we would like to define the lnguge ccepted by N. However, given n NFA N, unlike the sitution for DFA s, given stte p Q nd some input w Σ, in generl there is no unique pth from p on input w, but insted tree of computtion pths. For exmple, given the NFA shown below,,b b b Figure 3.5: NFA for {,b} {bb}

29 3.3. NONDETETERMINISTIC FINITE AUTOMATA (NFA S) 29 from stte 0 on input w = bbb we obtin the following tree of computtion pths: 0 b 0 1 b b 0 b 0 1 b 2 b 3 Figure 3.6: A tree of computtion pths on input bbb Observe tht there re three kinds of computtion pths: 1. A pth on input w ending in rejecting stte (for exmple, the lefmost pth). 2. A pth on some proper prefix of w, long which the computtion gets stuck (for exmple, the rightmost pth). 3. A pth on input w ending in n ccepting stte (such s the pth ending in stte 3). The cceptnce criterion for NFA is very lenient: string w is ccepted iff the tree of computtion pths contins some ccepting pth (of type (3)). Thus, ll filed pths of type (1) nd (2) re ignored. Furthermore, there is no chrge for filed pths. A string w is rejected iff ll computtion pths re filed pths of type (1) or (2). The philosophy of nondeterminism is tht n NFA guesses n ccepting pth nd then checks it in polynomil time by following this pth. We re only chrged for one ccepting pth (even if there re severl ccepting pths). A wy to cpture this cceptnce policy if to extend the trnsition function δ: Q (Σ {ǫ}) 2 Q to function

30 30 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES δ : Q Σ 2 Q. The presence of ǫ-trnsitions (i.e., when q δ(p, ǫ)) cuses technicl problems, nd to overcome these problems, we introduce the notion of ǫ-closure. 3.4 ǫ-closure Definition 3.6. Given n NFA N = (Q,Σ,δ,q 0,F) (with ǫ-trnsitions) for every stte p Q, the ǫ-closure of p is set ǫ-closure(p) consisting of ll sttes q such tht there is pth from p to q whose spelling is ǫ (n ǫ-pth). ǫ. This mens tht either q = p, or tht ll the edges on the pth from p to q hve the lbel We cn compute ǫ-closure(p) using sequence of pproximtions s follows. Define the sequence of sets of sttes (ǫ-clo i (p)) i 0 s follows: ǫ-clo 0 (p) = {p}, ǫ-clo i+1 (p) = ǫ-clo i (p) {q Q s ǫ-clo i (p), q δ(s,ǫ)}. Since ǫ-clo i (p) ǫ-clo i+1 (p), ǫ-clo i (p) Q, for ll i 0, nd Q is finite, it cn be shown tht there is smllest i, sy i 0, such tht ǫ-clo i0 (p) = ǫ-clo i0 +1(p). It suffices to show tht there is some i 0 such tht ǫ-clo i (p) = ǫ-clo i+1 (p), becuse then there is smllest such i (since every nonempty subset of N hs smllest element). Assume by contrdiction tht ǫ-clo i (p) ǫ-clo i+1 (p) for ll i 0. Then, I clim tht ǫ-clo i (p) i+1 for ll i 0. This is true for i = 0 since ǫ-clo 0 (p) = {p}. Sinceǫ-clo i (p) ǫ-clo i+1 (p),thereissomeq ǫ-clo i+1 (p)thtdoesnotbelongtoǫ-clo i (p), nd since by induction ǫ-clo i (p) i+1, we get ǫ-clo i+1 (p) ǫ-clo i (p) +1 i+1+1 = i+2, estblishing the induction hypothesis.

31 3.4. ǫ-closure 31 If n = Q, then ǫ-clo n (p) n+1, contrdiction. Therefore, there is indeed some i 0 such tht ǫ-clo i (p) = ǫ-clo i+1 (p), nd for the lest such i = i 0, we hve i 0 n 1. It cn lso be shown tht by proving tht 1. ǫ-clo i (p) ǫ-closure(p), for ll i ǫ-closure(p) i ǫ-clo i0 (p), for ll i 0. ǫ-closure(p) = ǫ-clo i0 (p), where ǫ-closure(p) i is the set of sttes rechble from p by n ǫ-pth of length i. When N hs no ǫ-trnsitions, i.e., when δ(p,ǫ) = for ll p Q (which mens tht δ cn be viewed s function δ: Q Σ 2 Q ), we hve ǫ-closure(p) = {p}. It should be noted tht there re more efficient wys of computing ǫ-closure(p), for exmple, using stck (bsiclly, kind of depth-first serch). We present such n lgorithm below. It is ssumed tht the types NFA nd stck re defined. If n is the number of sttes of n NFA N, we let eclotype = rry[1..n] of boolen function eclosure[n: NFA, p: integer]: eclotype; begin vr eclo: eclotype, q,s: integer, st: stck; for ech q setsttes(n) do eclo[q] := flse; endfor eclo[p] := true; st := empty; trns := delttble(n); st := push(st,p); while st emptystck do q = pop(st); for ech s trns(q,ǫ) do if eclo[s] = flse then eclo[s] := true; st := push(st,s)

32 32 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES endif endfor endwhile; eclosure := eclo end This lgorithm cn be esily dpted to compute the set of sttes rechble from given stte p (in DFA or n NFA). Given subset S of Q, we define ǫ-closure(s) s ǫ-closure(s) = s Sǫ-closure(s), with When N hs no ǫ-trnsitions, we hve ǫ-closure( ) =. ǫ-closure(s) = S. We re now redy to define the extension δ : Q Σ 2 Q of the trnsition function δ: Q (Σ {ǫ}) 2 Q. 3.5 Converting n NFA into DFA The intuition behind the definition of the extended trnsition function is tht δ (p,w) is the set of ll sttes rechble from p by pth whose spelling is w. Definition 3.7. Given n NFA N = (Q,Σ,δ,q 0,F) (with ǫ-trnsitions), the extended trnsition function δ : Q Σ 2 Q is defined s follows: for every p Q, every u Σ, nd every Σ, δ (p,ǫ) = ǫ-closure({p}), ( δ (p,u) = ǫ-closure In the second eqution, if δ (p,u) = then s δ (p,u) δ (p,u) =. The lnguge L(N) ccepted by n NFA N is the set ) δ(s, ). L(N) = {w Σ δ (q 0,w) F }.

33 3.5. CONVERTING AN NFA INTO A DFA 33 Observe tht the definition of L(N) conforms to the lenient cceptnce policy: string w is ccepted iff δ (q 0,w) contins some finl stte. We cn lso extend δ : Q Σ 2 Q to function δ: 2 Q Σ 2 Q defined s follows: for every subset S of Q, for every w Σ, δ(s,w) = s Sδ (s,w), with δ(,w) =. Let Q be the subset of 2 Q consisting of those subsets S of Q tht re ǫ-closed, i.e., such tht If we consider the restriction S = ǫ-closure(s). : Q Σ Q of δ: 2 Q Σ 2 Q to Q nd Σ, we observe tht is the trnsition function of DFA. Indeed, this is the trnsition function of DFA ccepting L(N). It is esy to show tht is defined directly s follows (on subsets S in Q): ( ) (S, ) = ǫ-closure δ(s, ), s S with (,) =. Then, the DFA D is defined s follows: where F = {S Q S F }. D = (Q,Σ,,ǫ-closure({q 0 }),F), It is not difficult to show tht L(D) = L(N), tht is, D is DFA ccepting L(N). For this, we show tht (S,w) = δ(s,w). Thus, we hve converted the NFA N into DFA D (nd gotten rid of ǫ-trnsitions).

34 34 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Since DFA s re specil NFA s, the subset construction shows tht DFA s nd NFA s ccept the sme fmily of lnguges, the regulr lnguges, version 1 (lthough not with the sme complexity). The sttes of the DFA D equivlent to N re ǫ-closed subsets of Q. For this reson, the bove construction is often clled the subset construction. This construction is due to Rbin nd Scott. Although theoreticlly fine, the method my construct useless sets S tht re not rechble from the strt stte ǫ-closure({q 0 }). A more economicl construction is given next. An Algorithm to convert n NFA into DFA: The subset construction Given n input NFA N = (Q,Σ,δ,q 0,F), DFA D = (K,Σ,,S 0,F) is constructed. It is ssumed tht K is liner rry of sets of sttes S Q, nd is 2-dimensionl rry, where [i,] is the index of the trget stte of the trnsition from K[i] = S on input, with S K, nd Σ. S 0 := ǫ-closure({q 0 }); totl := 1; K[1] := S 0 ; mrked := 0; while mrked < totl do; mrked := mrked+1; S := K[mrked]; for ech Σ do U := s Sδ(s,); T := ǫ-closure(u); if T / K then totl := totl +1; K[totl] := T endif; [mrked, ] := index(t) endfor endwhile; F := {S K S F } Let us illustrte the subset construction on the NFA of Exmple 4. A NFA for the lnguge L 3 = {,b} {bb}. Trnsition tble δ 4 :

35 3.5. CONVERTING AN NFA INTO A DFA 35 Set of ccepting sttes: F 4 = {3}. b 0 {0,1} {0} 1 {2} 2 {3} 3,b b b Figure 3.7: NFA for {,b} {bb} The pointer corresponds to mrked nd the pointer to totl. Initil trnsition tble. Just fter entering the while loop After the first round through the while loop. After just reentering the while loop. index sttes b A {0} index sttes b A {0} index sttes b A {0} B A B {0, 1} index sttes b A {0} B A B {0, 1} After the second round through the while loop. index sttes b A {0} B A B {0,1} B C C {0, 2}

36 36 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES After the third round through the while loop. index sttes b A {0} B A B {0,1} B C C {0,2} B D D {0, 3} After the fourth round through the while loop. index sttes b A {0} B A B {0,1} B C C {0,2} B D D {0,3} B A This is the DFA of Figure 3.3, except tht in tht exmple A, B, C, D re renmed 0,1,2,3. b b 0 b b Figure 3.8: DFA for {,b} {bb} 3.6 Finite Stte Automt With Output: Trnsducers So fr, we hve only considered utomt tht recognize lnguges, i.e., utomt tht do not produce ny output on ny input (except ccept or reject ). It is interesting nd useful to consider input/output finite stte mchines. Such utomt re clled trnsducers. They compute functions or reltions. First, we define deterministic kind of trnsducer. Definition 3.8. A generl sequentil mchine (gsm) is sextuple M = (Q,Σ,,δ,λ,q 0 ), where (1) Q is finite set of sttes,

37 3.6. FINITE STATE AUTOMATA WITH OUTPUT: TRANSDUCERS 37 (2) Σ is finite input lphbet, (3) is finite output lphbet, (4) δ: Q Σ Q is the trnsition function, (5) λ: Q Σ is the output function nd (6) q 0 is the initil (or strt) stte. If λ(p,) ǫ, for ll p Q nd ll Σ, then M is nonersing. If λ(p,) for ll p Q nd ll Σ, we sy tht M is complete sequentil mchine (csm). An exmple of gsm for which Σ = {,b} nd = {0,1,2} is shown in Figure 3.9. For exmple b is converted to b/11 b/01 /10 b/21 /00 /20 2 Figure 3.9: Exmple of gsm In order to define how gsm works, we extend the trnsition nd the output functions. We define δ : Q Σ Q nd λ : Q Σ recursively s follows: For ll p Q, ll u Σ nd ll Σ For ny w Σ, we let δ (p,ǫ) = p δ (p,u) = δ(δ (p,u),) λ (p,ǫ) = ǫ nd for ny L Σ nd L, let λ (p,u) = λ (p,u)λ(δ (p,u),). M(w) = λ (q 0,w) M(L) = {λ (q 0,w) w L}

38 38 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES nd M 1 (L ) = {w Σ λ (q 0,w) L }. Note tht if M is csm, then M(w) = w for ll w Σ. Also, homomorphism is specil kind of gsm it cn be relized by gsm with one stte. We cn use gsm s nd csm s to compute certin kinds of functions. Definition 3.9. A function f: Σ is gsm (resp. csm) mpping iff there is gsm (resp. csm) M so tht M(w) = f(w), for ll w Σ. Remrk: Ginsburg nd Rose (1966) chrcterized gsm mppings s follows: A function f: Σ is gsm mpping iff () f preserves prefixes, i.e., f(x) is prefix of f(xy); (b) There is n integer, m, such tht for ll w Σ nd ll Σ, we hve f(w) f(w) m; (c) f(ǫ) = ǫ; (d) For every regulr lnguge, R, the lnguge f 1 (R) = {w Σ f(w) R} is regulr. A function f: Σ is csm mpping iff f stisfies () nd (d), nd for ll w Σ, f(w) = w. The following proposition is left s homework problem. Proposition 3.1. The fmily of regulr lnguges (over n lphbet Σ) is closed under both gsm nd inverse gsm mppings. We cn generlize the gsm model so tht (1) the device is nondeterministic, (2) the device hs set of ccepting sttes, (3) trnsitions re llowed to occur without new input being processed, (4) trnsitions re defined for input strings insted of individul letters. Here is the definition of such model, the -trnsducer. A much more powerful model of trnsducer will be investigted lter: the Turing mchine.

39 3.6. FINITE STATE AUTOMATA WITH OUTPUT: TRANSDUCERS 39 Definition An -trnsducer (or nondeterministic sequentil trnsducer with ccepting sttes) is sextuple M = (K,Σ,,λ,q 0,F), where (1) K is finite set of sttes, (2) Σ is finite input lphbet, (3) is finite output lphbet, (4) q 0 K is the strt (or initil) stte, (5) F K is the set of ccepting (of finl) sttes nd (6) λ K Σ K is finite set of qudruples clled the trnsition function of M. If λ K Σ + K, then M is ǫ-free Clerly, gsm is specil kind of -trnsducer. An -trnsducer defines binry reltion between Σ nd, or equivlently, function M: Σ 2. We cn explin wht this function is by describing how n -trnsducer mkes sequence of moves from configurtions to configurtions. The current configurtion of n -trnsducer is described by triple (p,u,v) K Σ, where p is the current stte, u is the remining input, nd v is some ouput produced so fr. We define the binry reltion M on K Σ s follows: For ll p,q K, u,α Σ, β,v, if (p,u,v,q) λ, then (p, uα, β) M (q, α, βv). Let M be the trnsitive nd reflexive closure of M.

40 40 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES The function M: Σ 2 is defined such tht for every w Σ, M(w) = {y (q 0, w, ǫ) M (f, ǫ, y), f F}. For ny lnguge L Σ let M(L) = w LM(w). For ny y, let M 1 (y) = {w Σ y M(w)} nd for ny lnguge L, let M 1 (L ) = M 1 (y). y L Remrk: Notice tht if w M 1 (L ), then there exists some y L such tht w M 1 (y), i.e., y M(w). This does not imply tht M(w) L, only tht M(w) L. One should relize tht for ny L nd ny -trnsducer, M, there is some - trnsducer, M, (from to 2 Σ ) so tht M (L ) = M 1 (L ). The following proposition is left s homework problem: Proposition 3.2. The fmily of regulr lnguges (over n lphbet Σ) is closed under both -trnsductions nd inverse -trnsductions. 3.7 An Appliction of NFA s: Text Serch A common problem in the ge of the Web (nd on-line text repositories) is the following: Given set of words, clled the keywords, find ll the documents tht contin one (or ll) of those words. Serch engines re populr exmple of this process. Serch engines use inverted indexes (forechwordpperingontheweb, list oflltheplceswherethtwordoccursisstored). However, there re pplictions tht re unsuited for inverted indexes, but re good for utomton-bsed techniques. Some text-processing progrms, such s dvnced forms of the UNIX grep commnd (such s egrep or fgrep) re bsed on utomton-bsed techniques. The chrcteristics tht mke n ppliction suitble for serches tht use utomt re:

41 3.7. AN APPLICATION OF NFA S: TEXT SEARCH 41 (1) The repository on which the serch is conducted is rpidly chnging. (2) The documents to be serched cnnot be ctlogued. For exmple, Amzon.com cretes pges on the fly in response to queries. We cn use n NFA to find occurrences of set of keywords in text. This NFA signls by entering finl stte tht it hs seen one of the keywords. The form of such n NFA is specil. (1) There is strt stte, q 0, with trnsition to itself on every input symbol from the lphbet, Σ. (2) For ech keyword, w = w 1 w k (with w i Σ), there re k sttes, q (w) 1,...,q (w) k, nd there is trnsition from q 0 to q (w) 1 on input w 1, trnsition from q (w) 1 to q (w) 2 on input w 2, nd so on, until trnsition from q (w) k 1 to q(w) k on input w k. The stte q (w) k is n ccepting stte nd indictes tht the keyword w = w 1 w k hs been found. The NFA constructed bove cn then be converted to DFA using the subset construction. Here is n exmple where Σ = {,b} nd the set of keywords is {b, b, b}. b q1 b q2 b q3 b,b 0 b q1 b q2 b b q1 b q2 b Figure 3.10: NFA for the keywords b,b,b.

42 42 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES is: Applying the subset construction to the NFA, we obtin the DFA whose trnsition tble The finl sttes re: 3, 4, 5. b ,q1 b,q1 b ,q1 b ,q1 b,q2 b,q2 b ,q1 b,q1 b,q2 b ,q1 b,q1 b,qb 2,qb b 1 b 3 0 b b b 2 4 b Figure 3.11: DFA for the keywords b,b,b. The good news news is tht, due to the very specil structure of the NFA, the number of sttes of the corresponding DFA is t most the number of sttes of the originl NFA! We find tht the sttes of the DFA re (check it yourself!): (1) The set {q 0 }, ssocited with the strt stte q 0 of the NFA. (2) For ny stte p q 0 of the NFA reched from q 0 long pth corresponding to string u = u 1 u m, the set consisting of:

43 3.7. AN APPLICATION OF NFA S: TEXT SEARCH 43 () q 0 (b) p (c) The set of ll sttes q of the NFA rechble from q 0 by following pth whose symbols form nonempty suffix of u, i.e., string of the form u j u j+1 u m. As consequence, we get n efficient (w.r.t. time nd spce) method to recognize set of keywords. In fct, this DFA recognizes leftmost occurrences of keywords in text (we cn stop s soon s we enter finl stte).

44 44 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES

45 Chpter 4 Hidden Mrkov Models (HMMs) 4.1 Hidden Mrkov Models (HMMs) There is vrint of the notion of DFA with ouput, for exmple trnsducer such s gsm (generlized sequentil mchine), which is widely used in mchine lerning. This mchine model is known s hidden Mrkov model, for short HMM. These notes re only n introduction to HMMs nd re by no mens complete. For more comprehensive presenttions of HMMs, see the references t the end of this chpter. There re three new twists compred to trditionl gsm models: (1) There is finite set of sttes Q with n elements, bijection σ: Q {1,...,n}, nd the trnsitions between sttes re lbeled with probbilities rther tht symbols from n lphbet. For ny two sttes p nd q in Q, the edge from p to q is lbeled with probbility A(i,j), with i = σ(p) nd j = σ(q). The probbilities A(i,j) form n n n mtrix A = (A(i,j)). (2) There is finite set O of size m (clled the observtion spce) of possible outputs tht cn be emitted, bijection ω: O {1,...,m}, nd for every stte q Q, there is probbility B(i,j) tht output O O is emitted (produced), with i = σ(q) nd j = ω(o). The probbilities B(i,j) form n n m mtrix B = (B(i,j)). (3) Sequences of outputs O = (O 1,...,O T ) (with O t O for t = 1,...,T) emitted by the model re directly observble, but the sequences of sttes S = (q 1,...,q T ) (with q t Q for t = 1,...,T) tht cused some sequence of output to be emitted re not observble. In this sense the sttes re hidden, nd this is the reson for clling this model hidden Mrkov model. Remrk: We could define stte trnsition probbility function A: Q Q [0,1] by A(p,q) = A(σ(p),σ(q)), nd stte observtion probbility function B: Q O [0,1] by B(p, O) = B(σ(p), ω(o)). The function A conveys exctly the sme mount of informtion 45

46 46 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) s the mtrix A, nd the function B conveys exctly the sme mount of informtion s the mtrix B. The only difference is tht the rguments of A re sttes rther thn integers, so in tht sense it is perhps more nturl. We cn think of A s n implementtion of A. Similrly, the rguments of B re sttes nd outputs rther thn integers. Agin, we cn think of B s n implementtion of B. Most of the literture is rther sloppy bout this. We will use mtrices. Before going ny further, we wish to ddress nottionl issue tht everyone who writes bout stte-processes fces. This issue is bit of hedche which needs to be resolved to void lot of confusion. The issue is how to denote the sttes, the ouputs, s well s (ordered) sequences of sttes nd sequences of output. In most problems, sttes nd outputs hve meningful nmes. For exmple, if we wish to describe the evolution of the temperture from dy to dy, it mkes sense to use two sttes Cold nd Hot, nd to describe whether given individul hs drink by D, nd no drink by N. Thus our set of sttes is Q = {Cold,Hot}, nd our set of outputs is O = {N,D}. However, when computing probbilities, we need to use mtrices whose rows nd columns re indexed by positive integers, so we need mechnism to ssocite numericl index to every stte nd to every output, nd this is the purpose of the bijections σ: Q {1,...,n} nd ω: O {1,...,m}. In our exmple, we define σ by σ(cold) = 1 nd σ(hot) = 2, nd ω by ω(n) = 1 nd ω(d) = 2. Some uthor circumvent (or do they?) this nottionl issue by ssuming tht the set of outputs is O = {1,2,...,m}, nd tht the set of sttes is Q = {1,2,...,n}. The disdvntge of doing this is tht in rel situtions, it is often more convenient to nme the outputs nd the sttes with more meningful nmes thn 1,2,3 etc. With respect to this, Mitch Mrcus pointed out to me tht the tsk of nming the elements of the output lphbet cn be chllenging, for exmple in speech recognition. Let us now turn to sequences. For exmple, consider the sequence of six sttes (from the set Q = {Cold,Hot}), S = (Cold,Cold,Hot,Cold,Hot,Hot). Using the bijection σ: {Cold,Hot} {1,2} defined bove, the sequence S is completely determined by the sequence of indices σ(s) = (σ(cold),σ(cold),σ(hot),σ(cold),σ(hot),σ(hot)) = (1,1,2,1,2,2). More generlly, we will denote sequence of length T 1 of sttes from set Q of size n by S = (q 1,q 2,...,q T ), with q t Q for t = 1,...,T. Using the bijection σ: Q {1,...,n}, the sequence S is completely determined by the sequence of indices σ(s) = (σ(q 1 ),σ(q 2 ),...,σ(q T )),

47 4.1. HIDDEN MARKOV MODELS (HMMS) 47 where σ(q t ) is some index from the set {1,...,n}, for t = 1,...,T. The problem now is, wht is better nottion for the index denoted by σ(q t )? Of course, we could use σ(q t ), but this is hevy nottion, so we dopt the nottionl convention to denote the index σ(q t ) by i t. 1 Going bck to our exmple we hve S = (q 1,q 2,q 3,q 4,q 4,q 6 ) = (Cold,Cold,Hot,Cold,Hot,Hot), σ(s) = (σ(q 1 ),σ(q 2 ),σ(q 3 ),σ(q 4 ),σ(q 5 ),σ(q 6 )) = (1,1,2,1,2,2), so the sequence of indices (i 1,i 2,i 3,i 4,i 5,i 6 ) = (σ(q 1 ),σ(q 2 ),σ(q 3 ),σ(q 4 ),σ(q 5 ),σ(q 6 )) is given by σ(s) = (i 1,i 2,i 3,i 4,i 5,i 6 ) = (1,1,2,1,2,2). So, the fourth index i 4 is hs the vlue 1. We pply similr convention to sequences of outputs. For exmple, consider the sequence of six outputs (from the set O = {N,D}), O = (N,D,N,N,N,D). Using the bijection ω: {N,D} {1,2} defined bove, the sequence O is completely determined by the sequence of indices ω(o) = (ω(n),ω(d),ω(n),ω(n),ω(n),ω(d))= (1,2,1,1,1,2). More generlly, we will denote sequence of length T 1 of outputs from set O of size m by O = (O 1,O 2,...,O T ), with O t O for t = 1,...,T. Using the bijection ω: O {1,...,m}, the sequence O is completely determined by the sequence of indices ω(o) = (ω(o 1 ),ω(o 2 ),...,ω(o T )), where ω(o t ) is some index from the set {1,...,m}, for t = 1,...,T. This time, we dopt the nottionl convention to denote the index ω(o t ) by ω t. Going bck to our exmple O = (O 1,O 2,O 3,O 4,O 5,O 6 ) = (N,D,N,N,N,D), 1 We contemplted using the nottion σ t for σ(q t ) insted of i t. However, we feel tht this would devite too much from the common prctice found in the literture, which uses the nottion i t. This is not to sy tht the literture is free of horribly confusing nottion!

48 48 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) we hve ω(o) = (ω(o 1 ),ω(o 2 ),ω(o 3 ),ω(o 4 ),ω(o 5 ),ω(o 6 )) = (1,2,1,1,1,2), so the sequence of indices (ω 1,ω 2,ω 3,ω 4,ω 5,ω 6 ) = (ω(o 1 ),ω(o 2 ),ω(o 3 ),ω(o 4 ),ω(o 5 ), ω(o 6 )) is given by ω(o) = (ω 1,ω 2,ω 3,ω 4,ω 5,ω 6 ) = (1,2,1,1,1,2). Remrk: Wht is very confusing is this: to ssume tht our stte set is Q = {q 1,...,q n }, nd to denote sequence of sttes of length T s S = (q 1,q 2,...,q T ). The symbol q 1 in the sequence S my ctully refer to q 3 in Q, etc. We feel tht the explicit introduction of the bijections σ: Q {1,...,n} nd ω: O {1,...,m}, lthough not stndrd in the literture, yields mthemticlly clen wy to del with sequences which is not too cumbersome, lthough this ltter point is mtter of tste. HMM s re mong the most effective tools to solve the following types of problems: (1) DNA nd protein sequence lignment in the fce of muttions nd other kinds of evolutionry chnge. (2) Speech understnding, lso clled Automtic speech recognition. When we tlk, our mouths produce sequences of sounds from the sentences tht we wnt to sy. This process is complex. Multiple words my mp to the sme sound, words re pronounced differently s function of the word before nd fter them, we ll form sounds slightly differently, nd so on. All listener cn her (perhps computer system) is the sequence of sounds, nd the listener would like to reconstruct the mpping (bckwrd) in order to determine wht words we were ttempting to sy. For exmple, when you tlk to your TV to pick progrm, sy gme of thrones, you don t wnt to get Jessic Jones. (3) Opticl chrcter recognition (OCR). When we write, our hnds mp from n idelized symbol to some set of mrks on pge (or screen). The mrks re observble, but the process tht genertes them isn t. A system performing OCR, such s system used by the post office to red ddresses, must discover which word is most likely to correspond to the mrk it reds. Here is n exmple illustrting the notion of HMM. Exmple 4.1. Sy we consider the following behvior of some professor t some university. On hot dy (denoted by Hot), the professor comes to clss with drink (denoted D) with probbility 0.7, nd with no drink (denoted N) with probbility 0.3. On the other hnd, on

49 4.1. HIDDEN MARKOV MODELS (HMMS) 49 cold dy (denoted Cold), the professor comes to clss with drink with probbility 0.2, nd with no drink with probbility 0.8. Suppose student intrigued by this behvior recorded sequence showing whether the professor cme to clss with drink or not, sy NNND. Severl months lter, the student would like to know whether the wether ws hot or cold the dys he recorded the drinking behvior of the professor. Now the student herd bout mchine lerning, so he constructs probbilistic (hidden Mrkov) model of the wether. Bsed on some experiments, he determines the probbility of going from hot dy to nother hot dy to be 0.75, the probbility of going from hot to cold dy to be 0.25, the probbility of going from cold dy to nother cold dy to be 0.7, nd the probbility of going from cold dy to hot dy to be 0.3. He lso knows tht when he strted his observtions, it ws cold dy with probbility 0.45, nd hot dy with probbility Inthisexmple, thesetofsttesisq = {Cold,Hot},ndthesetofoutputsisO = {N,D}. We hve the bijection σ: {Cold,Hot} {1,2} given by σ(cold) = 1 nd σ(hot) = 2, nd the bijection ω: {N,D} {1,2} given by ω(n) = 1 nd ω(d) = 2 The bove dt determine n HMM depicted in Figure 4.1. strt Cold Hot N D Figure 4.1: Exmple of n HMM modeling the drinking behvior of professor t the University of Pennsylvni. The portion of the stte digrm involving the sttes Cold, Hot, is nlogous to n NFA in which the trnsition lbels re probbilities; it is the underlying Mrkov model of the HMM. For ny given stte, the probbilities on the outgoing edges sum to 1. The strt stte is convenient wy to express the probbilities of strting either in stte Cold or in stte

50 50 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) Hot. Also, from ech of the sttes Cold nd Hot, we hve emission probbilities of producing the ouput N or D, nd these probbilities lso sum to 1. We cn lso express these dt using mtrices. The mtrix A = describes the trnsitions of the Mkov model, the vector π = describes the probbilities of strting either in stte Cold or in stte Hot, nd the mtrix B = describes the emission probbilities. Observe tht the rows of the mtrices A nd B sum to 1. Such mtrices re clled row-stochstic mtrices. The entries in the vector π lso sum to 1. The student would like to solve wht is known s the decoding problem. Nmely, given the output sequence NNND, find the most likely stte sequence of the Mrkov model tht produces the output sequence NNND. Is it (Cold, Cold, Cold, Cold), or (Hot, Hot, Hot, Hot), or (Hot,Cold,Cold,Hot), or (Cold,Cold,Cold,Hot)? Given the probbilities of the HMM, it seems unlikely tht it is (Hot,Hot,Hot,Hot), but how cn we find the most likely one? Let us consider nother exmple tken from Stmp [19]. Exmple 4.2. Suppose we wnt to determine the verge nnul temperture t prticulr loction over series of yers in distnt pst where thermometers did not exist. Since we cn t go bck in time, we look for indirect evidence of the temperture, sy in terms of the size of tree growth rings. For simplicity, ssume tht we consider the two tempertures Cold nd Hot, nd three different sizes of tree rings: smll, medium nd lrge, which we denote by S, M, L. In this exmple, the set of sttes is Q = {Cold,Hot}, nd the set of outputs is O = {S, M, L}. We hve the bijection σ: {Cold, Hot} {1, 2} given by σ(cold) = 1 nd σ(hot) = 2, nd the bijection ω: {S,M,L} {1,2,3} given by ω(s) = 1, ω(m) = 2, nd ω(l) = 3. The HMM shown in Figure 4.2 is model of the sitution. Suppose we observe the sequence of tree growth rings (S, M, S, L). Wht is the most likely sequence of tempertures over four-yer period which yields the observtions (S, M, S, L)?

51 4.1. HIDDEN MARKOV MODELS (HMMS) 51 strt Cold Hot S M L Figure 4.2: Exmple of n HMM modeling the temperture in terms of tree growth rings. Going bck to Exmple 4.1, we need to figure out the probbility tht sequence of sttes S = (q 1,q 2,...,q T ) produces the output sequence O = (O 1,O 2,...,O T ). Then the probbility tht we wnt is just the product of the probbility tht we begin with stte q 1, times the product of the probbilities of ech of the trnsitions, times the product of the emission probbilities. With our nottionl conventions, σ(q t ) = i t nd ω(o t ) = ω t, so we hve T Pr(S,O) = π(i 1 )B(i 1,ω 1 ) A(i t 1,i t )B(i t,ω t ). In our exmple, ω(o) = (ω 1,ω 2,ω 3,ω 4 ) = (1,1,1,2), which corresponds to NNND. The brute-force method is to compute these probbilities for ll 2 4 = 16 sequences of sttes of length 4 (in generl, there re n T sequences of length T). For exmple, for the sequence S = (Cold,Cold,Cold,Hot), ssocited with the sequence of indices σ(s) = (i 1,i 2,i 3,i 4 ) = (1,1,1,2), we find tht Pr(S, NNND) = π(1)b(1, 1)A(1, 1)B(1, 1)A(1, 1)B(1, 1)A(1, 2)B(2, 2) t=2 = = A much more efficient wy to proceed is to use method bsed on dynmic progrmming. Recll the bijection σ: {Cold,Hot} {1,2}, so tht we will refer to the stte Cold s 1, nd to the stte Hot s 2. For t = 1,2,3,4, for every stte i = 1,2, we compute score(i,t) to be the highest probbility tht sequence of length t ending in stte i produces the output sequence (O 1,...,O t ), nd for t 2, we let pred(i,t) be the stte tht precedes stte i in best sequence of length t ending in i.

52 52 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) Recll tht in our exmple, ω(o) = (ω 1,ω 2,ω 3,ω 4 ) = (1,1,1,2), which corresponds to NNND. Initilly, we set score(j,1) = π(j)b(j,ω 1 ), j = 1,2, nd since ω 1 = 1 we get score(1,1) = = 0.36, which is the probbility of strting in stte Cold nd emitting N, nd score(2,1) = = 0.165, which is the probbility of strting in stte Hot nd emitting N. Next we compute score(1,2) nd score(2,2)s follows. For j = 1,2, for i = 1,2, compute temporry scores tscore(i,j) = score(i,1)a(i,j)b(j,ω 2 ); then pick the best of the temporry scores, score(j,2) = mx i tscore(i,j). Sinceω 2 = 1,wegettscore(1,1) = = ,tscore(2,1) = = , nd tscore(1, 2) = = , tscore(2, 2) = = Then score(1, 2) = mx{tscore(1, 1), tscore(2, 1)} = mx{0.2016, } = , which is the lrgest probbility tht sequence of two sttes emitting the output (N,N) ends in stte Cold, nd score(2, 2) = mx{tscore(1, 2), tscore(2, 2)} = mx{0.0324, } = which is the lrgest probbility tht sequence of two sttes emitting the output (N,N) ends in stte Hot. Since the stte tht leds to the optiml score score(1,2) is 1, we let pred(1,2) = 1, nd since the stte tht leds to the optiml score score(2,2) is 2, we let pred(2,2) = 2. We compute score(1,3) nd score(2,3) in similr wy. For j = 1,2, for i = 1,2, compute tscore(i,j) = score(i,2)a(i,j)b(j,ω 3 ); then pick the best of the temporry scores, score(j,3) = mx i tscore(i,j). Since ω 3 = 1, we get tscore(1,1) = = , tscore(2,1) = = , nd tscore(1,2) = = , tscore(2,2) = = Then score(1, 3) = mx{tscore(1, 1), tscore(2, 1)} = mx{0.1129, 0.074} = ,

53 4.1. HIDDEN MARKOV MODELS (HMMS) 53 which is the lrgest probbility tht sequence of three sttes emitting the output (N,N,N) ends in stte Cold, nd score(2, 3) = mx{tscore(1, 2), tscore(2, 2)} = mx{0.0181, } = , which is the lrgest probbility tht sequence of three sttes emitting the output (N,N,N) ends in stte Hot. We lso get pred(1,3) = 1 nd pred(2,3) = 1. Finlly, we compute score(1,4) nd score(2,4) in similr wy. For j = 1,2, for i = 1,2, compute then pick the best of the temporry scores, tscore(i,j) = score(i,3)a(i,j)b(j,ω 4 ); score(j,4) = mx i tscore(i,j). Since ω 4 = 2, we get tscore(1,1) = = , tscore(2,1) = = , nd tscore(1,2) = = , tscore(2,2) = = Then score(1, 4) = mx{tscore(1, 1), tscore(2, 1)} = mx{0.0158, } = , whichisthelrgestprobbilitythtsequence offoursttesemittingtheoutput(n,n,n,d) ends in stte Cold, nd score(2, 4) = mx{tscore(1, 2), tscore(2, 2)} = mx{0.0237, } = , whichisthelrgestprobbilitythtsequence offoursttesemittingtheoutput(n,n,n,d) ends in stte Hot, nd pred(1,4) = 1 nd pred(2,4) = 1 Since mx{score(1, 4), score(2, 4)} = mx{0.0158, } = , the stte with the mximum score is Hot, nd by following the predecessor list (lso clled bckpointer list), we find tht the most likely stte sequence to produce the output sequence NNND is (Cold, Cold, Cold, Hot). The stges of the computtions of score(j,t) for i = 1,2 nd t = 1,2,3,4 cn be recorded in the following digrm clled lttice, or trellis (which mens lttice in French!): Cold Hot Double rrows represent the predecessor edges. For exmple, the predecessor pred(2, 3) of the third node on the bottom row lbeled with the score (which corresponds to

54 54 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) Hot), is the second node on the first row lbeled with the score (which corresponds to Cold). The two incoming rrows to the third node on the bottom row re lbeled with the temporry scores nd The node with the highest score t time t = 4 is Hot, with score (showed in bold), nd by following the double rrows bckwrd from this node, we obtin the most likely stte sequence (Cold, Cold, Cold, Hot). The method we just described is known s the Viterbi lgorithm. We now define HHM s in generl, nd then present the Viterbi lgorithm. Definition 4.1. Ahidden Mrkov model,forshorthmm,isquintuplem = (Q,O,π,A,B) where Q is finite set of sttes with n elements, nd there is bijection σ: Q {1,...,n}. O is finite output lphbet (lso clled set of possible observtions) with m observtions, nd there is bijection ω: O {1,...,m}. A = (A(i,j)) is n n n mtrix clled the stte trnsition probbility mtrix, with A(i,j) 0, 1 i,j n, nd n A(i,j) = 1, i = 1,...,n. j=1 B = (B(i,j)) is n n m mtrix clled the stte observtion probbility mtrix (lso clled confusion mtrix), with B(i,j) 0, 1 i,j n, nd m B(i,j) = 1, i = 1,...,n. j=1 A mtrix stisfying the bove conditions is sid to be row stochstic. Both A nd B re row-stochstic. We lso need to stte the conditions tht mke M Mrkov model. To do this rigorously requires the notion of rndom vrible nd is bit tricky (see the remrk below), so we will chet s follows: () Given ny sequence of sttes (q 0,...,q t 2,p,q), the conditionl probbility tht q is the tth stte given tht the previous sttes were q 0,...,q t 2,p is equl to the conditionl probbility tht q is the tth stte given tht the previous stte t time t 1 is p: Pr(q q 0,...,q t 2,p) = Pr(q p). This is the Mrkov property. Informlly, the next stte q of the process t time t is independent of the pst sttes q 0,...,q t 2, provided tht the present stte p t time t 1 is known.

55 4.1. HIDDEN MARKOV MODELS (HMMS) 55 (b) Given ny sequence of sttes (q 0,...,q i,...,q t ), nd given ny sequence of outputs (O 0,...,O i,...,o t ), the conditionl probbility tht the output O i is emitted depends only on the stte q i, nd not ny other sttes or ny other observtions: Pr(O i q 0,...,q i,...,q t,o 0,...,O i,...,o t ) = Pr(O i q i ). This is the output independence condition. Informlly, the output function is nersighted. Exmples of HMMs re shown in Figure 4.1, Figure 4.2, nd Figure 4.3. Note tht n ouput is emitted when visiting stte, not when mking trnsition, s in the cse of gsm. So the nlogy with the gsm model is only prtil; it is ment s motivtion for HMMs. The hidden Mrkov model ws developed by L. E. Bum nd collegues t the Institue for Defence Anlysis t Princeton (including Petrie, Egon, Sell, Soules, nd Weiss) strting in If we ignore the output components O nd B, then we hve wht is clled Mrkov chin. A good interprettion of Mrkov chin is the evolution over (discrete) time of the popultions of n species tht my chnge from one species to nother. The probbility A(i,j) is the frction of the popultion of the ith species tht chnges to the jth species. If we denote the popultions t time t by the row vector x = (x 1,...,x n ), nd the popultions t time t+1 by y = (y 1,...,y n ), then y j = A(1,j)x 1 + +A(i,j)x i + +A(n,j)x n, 1 j n, in mtrix form, y = xa. The condition n j=1a(i,j) = 1 expresses tht the totl popultion is preserved, nmely y 1 + +y n = x 1 + +x n. Remrk: This remrk is intended for the reder who knows some probbility theory, nd it cn be skipped without ny negtive effect on understnding the rest of this chpter. Given probbility spce (Ω,F,µ) nd ny countble set Q (for simplicity we my ssume Q is finite), stochstic discrete-prmeter process with stte spce Q is countble fmily (X t ) t N of rndom vribles X t : Ω Q. We cn think of t s time, nd for ny q Q, of Pr(X t = q) s the probbility tht the process X is in stte q t time t. If Pr(X t = q X 0 = q 0,...,X t 2 = q t 2,X t 1 = p) = Pr(X t = q X t 1 = p) for ll q 0,,...,q t 2,p,q Q nd for ll t 1, nd if the probbility on the right-hnd side is independent of t, then we sy tht X = (X t ) t N is time-homogeneous Mrkov chin, for short, Mrkov chin. Informlly, the next stte X t of the process is independent of the pst sttes X 0,...,X t 2, provided tht the present stte X t 1 is known. Since for simplicity Q is ssumed to be finite, there is bijection σ: Q {1,...,n}, nd then, the process X is completely determined by the probbilities ij = Pr(X t = q X t 1 = p), i = σ(p), j = σ(q), p,q Q,

56 56 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) nd if Q is finite stte spce of size n, these form n n n mtrix A = ( ij ) clled the Mrkov mtrix of the process X. It is row-stochstic mtrix. The beuty of Mrkov chins is tht if we write π(i) = Pr(X 0 = i) for the initil probbility distribution, then the joint probbility distribution of X 0,X 1,..., X t is given by Pr(X 0 = i 0,X 1 = i 1,...,X t = i t ) = π(i 0 )A(i 0,i 1 ) A(i t 1,i t ). The bove expression onlyinvolves π ndthe mtrixa, ndmkes no mention of theoriginl mesure spce. Therefore, it doesn t mtter wht the probbility spce is! Conversely, given n n n row-stochstic mtrix A, let Ω be the set of ll countble sequences ω = (ω 0,ω 1,...,ω t,...) with ω t Q = {1,...,n} for ll t N, nd let X t : Ω Q be the projection on the tth component, nmely X t (ω) = ω t. 2 Then it is possible to define σ-lgebr (lso clled σ-field) B nd mesure µ on B such tht (Ω,B,µ) is probbility spce, nd X = (X t ) t N is Mrkov chin with corresponding Mrkov mtrix A. To define B, proceed s follows. For every t N, let F t be the fmily of ll unions of subsets of Ω of the form {ω Ω (X 0 (ω) S 0 ) (X 1 (ω) S 1 ) (X t (ω) S t )}, where S 0,S 1,...,S t re subsets of the stte spce Q = {1,...,n}. It is not hrd to show tht ech F t is σ-lgebr. Then let F = t 0F t. Ech set in F is set of pths for which finite number of outcomes re restricted to lie in certin subsets of Q = {1,...,n}. All other outcomes re unrestricted. In fct, every subset C in F is countble union of sets of the form C = i N B (t) i B (t) i = {ω Ω ω = (q 0,q 1,...,q t,s t+1,...s j,...,) q 0,q 1,...,q t Q} = {ω Ω X 0 (ω) = q 0,X 1 (ω) = q 1,...,X t (ω) = q t }. 2 It is customry in probbility theory to denote events by the letter ω. In the present cse, ω denotes countble sequence of elements from Q. This nottion hs nothing do with the bijection ω: O {1,...,m} occurring in Definition 4.1.

57 4.1. HIDDEN MARKOV MODELS (HMMS) 57 The sequences in B (t) i re those beginning with the fixed sequence (q 0,q 1,...,q t ). One cn show tht F is field of sets, but not necessrily σ-lgebr, so we form the smllest σ-lgebr G contining F. Using the mtrix A we cn define the mesure ν(b (t) i ) s the product of the probbilities long the sequence (q 0,q 1,...,q t ). Then it cnbe shown tht ν cn beextended to mesure µ on G, nd we let B be the σ-lgebr obtined by dding to G ll subsets of sets of mesure zero. The resulting probbility spce (Ω, B, µ) is usully clled the sequence spce, nd the mesure µ is clled the tree mesure. Then it is esy to show tht the fmily of rndom vribles X t : Ω Q on the probbility spce(ω,b,µ) is time-homogeneous Mrkov chin whose Mrkov mtrix is the orginl mtrix A. The bove construction is presented in full detil in Kemeny, Snell, nd Knpp[10] (Chpter 2, Sections 1 nd 2). Most presenttions of Mrkov chins do not even mention the probbility spce over which the rndom vribles X t re defined. This mkes the whole thing quite mysterious, since the probbilities Pr(X t = q) re by definition given by Pr(X t = q) = µ({ω Ω X t (ω) = q}), which requires knowing the mesure µ. This is more problemtic if we strt with stochstic mtrix. Wht re the rndom vribles X t, wht re they defined on? The bove construction puts things on firm grounds. There re three types of problems tht cn be solved using HMMs: (1) The decoding problem: Given n HMM M = (Q,O,π,A,B), for ny observed output sequence O = (O 1,O 2,...,O T ) of length T 1, find most likely sequence of sttes S = (q 1,q 2,...,q T ) tht produces the output sequence O. More precisely, with our nottionl convention tht σ(q t ) = i t nd ω(o t ) = ω t, this mens finding sequence S such tht the probbility Pr(S,O) = π(i 1 )B(i 1,ω 1 ) T A(i t 1,i t )B(i t,ω t ) is mximl. This problem is solved effectively by the Viterbi lgorithm tht we outlined before. (2) The evlution problem, lso clled the likelyhood problem: Given finite collection {M 1,...,M L } of HMM s with the sme output lphbet O, for ny output sequence O = (O 1,O 2,...,O T ) of length T 1, find which model M l is most likely to hve generted O. More precisely, given ny model M k, we compute the probbility tprob k tht M k could hve produced O long ny pth. Then we pick n HMM M l for which tprob l is mximl. We will return to this point fter hving described the Viterbi lgoritm. A vrition of the Viterbi lgorithm clled the forwrd lgorithm effectively solves the evlution problem. t=2

58 58 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) (3) The trining problem, lsoclledthe lerning problem: Givenset{O 1,...,O r } ofoutput sequences onthe smeoutput lpbet O, usully clled set oftrining dt, given Q, find the best π,a, nd B for n HMM M tht produces ll the sequences in the trining set, in the sense tht the HMM M = (Q,O,π,A,B) is the most likely to hve produced the sequences in the trining set. The technique used here is clled expecttion mximiztion, or EM. It is n itertive method tht strts with n initil triple π,a,b, nd tries to impove it. There is such n lgorithm known s the Bum- Welch or forwrd-bckwrd lgorithm, but it is beyond the scope of this introduction. Let us now describe the Viterbi lgorithm in more detils. 4.2 The Viterbi Algorithm nd the Forwrd Algorithm Given n HMM M = (Q,O,π,A,B), for ny observed output sequence O = (O 1,O 2,..., O T ) of length T 1, we wnt to find most likely sequence of sttes S = (q 1,q 2,...,q T ) tht produces the output sequence O. Using the bijections σ: Q {1,...,n} nd ω: O {1,...,m}, we cn work with sequences of indices, nd recll tht we denote the index σ(q t ) ssocited with the tth stte q t in the sequence S by i t, nd the index ω(o t ) ssocited with the tth output O t in the sequence O by ω t. Then we need to find sequence S such tht the probbility is mximl. Pr(S,O) = π(i 1 )B(i 1,ω 1 ) T A(i t 1,i t )B(i t,ω t ) In generl, there re n T sequences of length T. This problem cn be solved efficiently by method bsed on dynmic progrmming. For ny t, 1 t T, for ny stte q Q, if σ(q) = j, then we compute score(j,t), which is the lrgest probbility tht sequence (q 1,...,q t 1,q)oflengthtendingwithq hsproducedtheoutputsequence(o 1,...,O t 1,O t ). The point is tht if we know score(k,t 1) for k = 1,...,n (with t 2), then we cn find score(j,t) for j = 1,...,n, becuse if we write k = σ(q t 1 ) nd j = σ(q) (recll tht ω t = ω(o t )), then the probbility ssocited with the pth (q 1,...,q t 1,q) is t=2 tscore(k,j) = score(k,t 1)A(k,j)B(j,ω t ).

59 4.2. THE VITERBI ALGORITHM AND THE FORWARD ALGORITHM 59 See the illustrtion below: stteindices i 1... k j σ σ σ score(k,t 1) sttes q 1... q t 1 A(k,j) q outputs O 1... O t 1 O t B(j,ω t) ω outputindices ω 1... ω t 1 ω t So to mximize this probbility, we just hve to find the mximum of the probbilities tscore(k,j) over ll k, tht is, we must hve See the illustrtion below: score(j,t) = mxtscore(k, j). k σ 1 (1) tscore(1,j) σ 1 tscore(k,j) (k) q = σ 1 (j) tscore(n,j) σ 1 (n) To get strted, we set score(j,1) = π(j)b(j,ω 1 ) for j = 1,...,n. The lgorithm goes through forwrd phse for t = 1,...,T, during which it computes the probbilities score(j,t) for j = 1,...,n. When t = T, we pick n index j such tht score(j, T) is mximl. The mchine lerning community is fond of the nottion j = rgmxscore(k,t) k to express the bove fct. Typiclly, the smllest index j corresponding the mximum element in the list of probbilities (score(1,t),score(2,t),...,score(n,t)) is returned. This gives us the lst stte q T = σ 1 (j) in n optiml sequence tht yields the output sequence O. ω ω

60 60 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) The lgorithm then goes through pth retrievl phse. To do this, when we compute score(j,t) = mxtscore(k, j), k we lso record the index k = σ(q t 1 ) of the stte q t 1 in the best sequence (q 1,...,q t 1,q t ) for which tscore(k,j) is mximl (with j = σ(q t )), s pred(j,t) = k. The index k is often clled the bckpointer of j t time t. This index my not be unique, we just pick one of them. Agin, this cn be expressed by pred(j,t) = rgmxtscore(k, j). k Typiclly, the smllest index k corresponding the mximum element in the list of probbilities (tscore(1,j),tscore(2,j),...,tscore(n,j)) is returned. The predecessors pred(j,t)reonlydefinedfort = 2,...,T, but wecnletpred(j,1) = 0. Observe tht the pth retrievl phse of the Viterbi lgorithm is very similr to the phse of Dijkstr s lgorithm for finding shortest pth tht follows the prev rry. One should not confuse this phse with wht is clled the bckwrd lgorithm, which is used in solving the lerning problem. The forwrd phse of the Viterbi lgorithm is quite different from the Dijkstr s lgorithm, nd the Viterbi lgorithm is ctully simpler (it computes score(j,t) for ll sttes nd for t = 1,...,T), wheres Dijkstr s lgorithm mintins list of unvisited vertices, nd needs to pick the next vertex). The mjor difference is tht the Viterbi lgorithm mximizes product of weights long pth, but Dijkstr s lgorithm minimizes sum of weights long pth. Also, the Viterbi lgorithm knows the length of the pth (T) hed of time, but Dijkstr s lgorithm does not. The Viterbi lgorithm, invented by Andrew Viterbi in 1967, is shown below. The input to the lgorithm is M = (Q,O,π,A,B) nd the sequence of indices ω(o) = (ω 1,...,ω T ) ssocited with the observed sequence O = (O 1,O 2,...,O T ) of length T 1, with ω t = ω(o t ) for t = 1,...,T. The output is sequence of sttes (q 1,...,q T ). This sequence is determined by the sequence of indices (I 1,...,I T ); nmely, q t = σ 1 (I t ). The Viterbi Algorithm begin for j = 1 to n do score(j,1) = π(j)b(j,ω 1 ) endfor;

61 4.2. THE VITERBI ALGORITHM AND THE FORWARD ALGORITHM 61 ( forwrd phse to find the best (highest) scores ) for t = 2 to T do for j = 1 to n do for k = 1 to n do tscore(k) = score(k,t 1)A(k,j)B(j,ω t ) endfor; score(j,t) = mx k tscore(k); pred(j,t) = rgmx k tscore(k) endfor endfor; ( second phse to retrieve the optiml pth ) end I T = rgmx j score(j,t); q T = σ 1 (I T ); for t = T to 2 by 1 do I t 1 = pred(i t,t); q t 1 = σ 1 (I t 1 ) endfor An illustrtion of the Viterbi lgorithm pplied to Exmple 4.1 ws presented fter Exmple 4.3. If we run the Viterbi lgorithm on the output sequence (S, M, S, L) of Exmple 4.2, we find tht the sequence (Cold, Cold, Cold, Hot) hs the highest probbility, , mong ll sequences of length four. One my hve noticed tht the numbers involved, being products of probbilities, become quite smll. Indeed, underflow my rise in dynmic progrmming. Fortuntely, there is simple wy to void underflow by tking logrithms. We initilize the lgorithm by computing score(j,1) = log[π(j)]+log[b(j,ω 1 )], nd in the step where tscore is computed we use the formul tscore(k) = score(k,t 1)+log[A(k,j)]+log[B(j,ω t )]. It immeditely verified tht the time complexity of the Viterbi lgorithm is O(n 2 T). Let us now to turn to the second problem, the evlution problem(or likelyhood problem). This time, given finite collection {M 1,...,M L } of HMM s with the sme output lphbet O, for ny observed output sequence O = (O 1,O 2,...,O T ) of length T 1, find which model M l is most likely to hve generted O. More precisely, given ny model M k,

62 62 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) we compute the probbility tprob k tht M k could hve produced O long ny sequence of sttes S = (q 1,...,q T ). Then we pick n HMM M l for which tprob l is mximl. The probbility tprob k tht we re seeking is given by tprob k = Pr(O) = Pr((q i1,...,q it ),O) (i 1,...,i T ) {1,...,n} T = (i 1,...,i T ) {1,...,n} T π(i 1 )B(i 1,ω 1 ) T A(i t 1,i t )B(i t,ω t ), where {1,...,n} T denotes the set of ll sequences of length T consisting of elements from the set {1,...,n}. It is not hrd to see tht brute-force computtion requires 2Tn T multiplictions. Fortuntely, it is esy to dpt the Viterbi lgorithm to compute tprob k efficiently. Since we re not looking for n explicity pth, there is no need for the second phse, nd during the forwrd phse, going from t 1 to t, rther thn finding the mximum of the scores tscore(k) for k = 1,...,n, we just set score(j,t) to the sum over k of the temporry scores tscore(k). At the end, tprob k is the sum over j of the probbilities score(j,t). The lgorithm solving the evlution problem known s the forwrd lgorithm is shown below. The input to the lgorithm is M = (Q,O,π,A,B) nd the sequence of indices ω(o) = (ω 1,...,ω T ) ssocited with the observed sequence O = (O 1,O 2,...,O T ) of length T 1, with ω t = ω(o t ) for t = 1,...,T. The output is the probbility tprob. t=2 The Fowrd Algorithm begin for j = 1 to n do score(j,1) = π(j)b(j,ω 1 ) endfor; for t = 2 to T do for j = 1 to n do for k = 1 to n do tscore(k) = score(k,t 1)A(k,j)B(j,ω t ) endfor; score(j,t) = k tscore(k) endfor

63 4.2. THE VITERBI ALGORITHM AND THE FORWARD ALGORITHM 63 endfor; tprob = j score(j,t) end We cn now run the bove lgorithm on M 1,...,M L to compute tprob 1,...,tprob L, nd we pick the model M l for which tprob l is mximum. As for the Viterbi lgorithm, the time complexity of the forwrd lgorithm is O(n 2 T). Underflow is lso problem with the forwrd lgorithm. At first glnce it looks like tking logrithms does not help becuse there is no simple expression for log(x 1 + +x n ) in terms of the logx i. Fortuntely, we cn use the log-sum exp trick (which I lerned from Mitch Mrcus), nmely the identity ( n ) ( n ) log i=1 e x i = +log i=1 e x i for ll x 1,...,x n R nd R (tke exponentils on both sides). Then, if we pick = mx 1 i n x i, we get n 1 e xi n, so mx x i log 1 i n i=1 ( n i=1 e x i ) mx 1 i n x i +logn, which shows tht mx 1 i n x i is good pproximtion for log( n i=1 ex i ). For ny positive rels y 1,...,y n, if we let x i = logy i, then we get ( n ( n ) log i=1 y i ) = mx 1 i n logy i +log i=1 e log(y i), with = mx 1 i n logy i. We will use this trick to compute ( n ) ( n ) log(score(j, k)) = log e log(tscore(k)) = +log e log(tscore(k)) k=1 with = mx 1 k n log(tscore(k)), where tscore((k) could be very smll, but log(tscore(k)) is not, so computing log(tscore(k)) does not cuse underflow, nd 1 n e log(tscore(k)) n, k=1 since log(tscore(k)) 0 nd one of these terms is equl to zero, so even if some of the terms e log(tscore(k)) re very smll, ( this does not cuse ny trouble. We will lso use this n ) trick to compute log(tprob) = log j=1 score(j,t) in terms of the log(score(j,t)). k=1

64 64 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) We leve it s n exercise to the reder to modify the forwrd lgorithm so tht it computes log(score(j, t)) nd log(tprob) using the log-sum exp trick. If you use Mtlb, then this is quite esy becuse Mtlb does lot of the work for you since it cn pply opertors such s exp or (sum) to vectors. Exmple 4.3. To illustrte the forwrd lgorithm, ssume tht our observnt student lso recorded the drinking behvior of professor t Hrvrd, nd tht he cme up with the HHM shown in Figure 4.3. strt Cold Hot N D Figure 4.3: Exmple of n HMM modeling the drinking behvior of professor t Hrvrd. However, the student cn t remember whether he observed the sequence NNND t Penn or t Hrvrd. So he runs the forwrd lgorithm on both HMM s to find the most likely model. Do it! Following Jurfsky, the following chronology shows how of the Viterbi lgorithm hs hd pplictions in mny seprte fields. Cittion Viterbi (1967) Vintsyuk (1968) Field informtion theory speech processing Needlemn nd Wunsch (1970) moleculr biology Skoe nd Chib (1971) Snkoff (1972) Reichert et l. (1973) Wgner nd Fischer (1974) speech processing moleculr biology moleculr biology computer science

65 4.2. THE VITERBI ALGORITHM AND THE FORWARD ALGORITHM 65 Reders who wish to lern more bout HMMs should begin with Stmp [19], gret tutoril which contins very cler nd esy to red presenttion. Another nice introduction is given in Rich [18] (Chpter 5, Section 5.11). A much more complete, yet ccessible, coverge of HMMs is found in Rbiner s tutoril [16]. Jurfsky nd Mrtin s online Chpter 9 (Hidden Mrkov Models) is lso very good nd informl tutoril (see jurfsky/slp3/9.pdf). A very cler nd quite ccessible presenttion of Mrkov chins is given in Cinlr [4]. Another thorough but bit more dvnced presenttion is given in Brémud [3]. Other presenttions of Mrkov chins cn be found in Mitzenmcher nd Upfl [13], nd in Grimmett nd Stirzker [9]. Acknowledgments: I would like to thnk Mitch Mrcus, Jocelyn Qintnce, nd Joo Sedoc, for scrutinizing my work nd for mny insightful comments.

66 66 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS)

67 Chpter 5 Regulr Lnguges nd Equivlence Reltions, The Myhill-Nerode Chrcteriztion, Stte Equivlence 5.1 Directed Grphs nd Pths It is often useful to view DFA s nd NFA s s lbeled directed grphs. Definition 5.1. A directed grph is qudruple G = (V,E,s,t), where V is set of vertices, or nodes, E is set of edges, or rcs, nd s,t: E V re two functions, s being clled the source function, nd t the trget function. Given n edge e E, we lso cll s(e) the origin (or source) of e, nd t(e) the endpoint (or trget) of e. Remrk: the functions s, t need not be injective or surjective. Thus, we llow isolted vertices. Exmple: Let G be the directed grph defined such tht E = {e 1,e 2,e 3,e 4,e 5,e 6,e 7,e 8 }, V = {v 1,v 2,v 3,v 4,v 5,v 6 }, nd s(e 1 ) = v 1, s(e 2 ) = v 2, s(e 3 ) = v 3, s(e 4 ) = v 4, s(e 5 ) = v 2, s(e 6 ) = v 5, s(e 7 ) = v 5, s(e 8 ) = v 5, t(e 1 ) = v 2, t(e 2 ) = v 3, t(e 3 ) = v 4, t(e 4 ) = v 2, t(e 5 ) = v 5, t(e 6 ) = v 5, t(e 7 ) = v 6, t(e 8 ) = v 6. Such grph cn be represented by the following digrm: 67

68 68 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S e 6 e 7 v 5 v 6 e 8 e 5 v 4 e 4 e 1 v 1 v 2 e 3 e 2 v 3 Figure 5.1: A directed grph In drwing directed grphs, we will usully omit edge nmes (the e i ), nd sometimes even the node nmes (the v j ). We now define pths in directed grph. Definition 5.2. Given directed grph G = (V,E,s,t), for ny two nodes u,v V, pth from u to v is triple π = (u,e 1...e n,v), where e 1...e n is string (sequence) of edges in E such tht, s(e 1 ) = u, t(e n ) = v, nd t(e i ) = s(e i+1 ), for ll i such tht 1 i n 1. When n = 0, we must hve u = v, nd the pth (u,ǫ,u) is clled the null pth from u to u. The number n is the length of the pth. We lso cll u the source (or origin) of the pth, nd v the trget (or endpoint) of the pth. When there is nonnull pth π from u to v, we sy tht u nd v re connected. Remrk: In pth π = (u,e 1...e n,v), the expression e 1...e n is sequence, nd thus, the e i re not necessrily distinct. For exmple, the following re pths: π 1 = (v 1,e 1 e 5 e 7,v 6 ),

69 5.1. DIRECTED GRAPHS AND PATHS 69 nd π 2 = (v 2,e 2 e 3 e 4 e 2 e 3 e 4 e 2 e 3 e 4,v 2 ), π 3 = (v 1,e 1 e 2 e 3 e 4 e 2 e 3 e 4 e 5 e 6 e 6 e 8,v 6 ). Clerly, π 2 nd π 3 re of different nture from π 1. Indeed, they contin cycles. This is formlized s follows. Definition 5.3. Given directed grph G = (V,E,s,t), for ny node u V cycle (or loop) through u is nonnull pth of the form π = (u,e 1...e n,u) (equivlently, t(e n ) = s(e 1 )). More generlly, nonnull pth π = (u,e 1...e n,v) contins cycle iff for some i,j, with 1 i j n, t(e j ) = s(e i ). In this cse, letting w = t(e j ) = s(e i ), the pth (w,e i...e j,w) is cycle through w. A pth π is cyclic iff it does not contin ny cycle. Note tht ech null pth (u,ǫ,u) is cyclic. Obviously, cycle π = (u,e 1...e n,u) through u is lso cycle through every node t(e i ). Also, pth π my contin severl different cycles. Pths cn be conctented s follows. Definition 5.4. Given directed grph G = (V,E,s,t), two pths π 1 = (u,e 1...e m,v) nd π 2 = (u,e 1...e n,v ) cn be conctented provided tht v = u, in which cse their conctention is the pth π 1 π 2 = (u,e 1...e m e 1...e n,v ). It is immeditely verified tht the conctention of pths is ssocitive, nd tht the conctention of the pth π = (u,e 1...e m,v) with the null pth (u,ǫ,u) or with the null pth (v,ǫ,v) is the pth π itself. The following fct, lthough lmost trivil, is used ll the time, nd is worth stting in detil. Proposition 5.1. Given directed grph G = (V,E,s,t), if the set of nodes V contins m 1 nodes, then every pth π of length t lest m contins some cycle. A consequence of Proposition 5.1 is tht in finite grph with m nodes, given ny two nodes u,v V, in order to find out whether there is pth from u to v, it is enough to consider pths of length m 1. Indeed, if there is pth between u nd v, then there is some pth π of miniml length (not necessrily unique, but this doesn t mtter). If this miniml pth hs length t lest m, then by the Proposition, it contins cycle. However, by deleting this cycle from the pth π, we get n even shorter pth from u to v, contrdicting the minimlity of π. We now turn to lbeled grphs.

70 70 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S 5.2 Lbeled Grphs nd Automt In fct, we only need edge-lbeled grphs. Definition 5.5. A lbeled directed grph is tuple G = (V,E,L,s,t,λ), where V is set of vertices, or nodes, E is set of edges, or rcs, L is set of lbels, s,t: E V re two functions, s being clled the source function, nd t the trget function, nd λ: E L is the lbeling function. Given n edge e E, we lso cll s(e) the origin (or source) of e, t(e) the endpoint (or trget) of e, nd λ(e) the lbel of e. Note tht the function λ need not be injective or surjective. Thus, distinct edges my hve the sme lbel. Exmple: Let G be the directed grph defined such tht E = {e 1,e 2,e 3,e 4,e 5,e 6,e 7,e 8 }, V = {v 1,v 2,v 3,v 4,v 5,v 6 }, L = {,b}, nd s(e 1 ) = v 1, s(e 2 ) = v 2, s(e 3 ) = v 3, s(e 4 ) = v 4, s(e 5 ) = v 2, s(e 6 ) = v 5, s(e 7 ) = v 5, s(e 8 ) = v 5, t(e 1 ) = v 2, t(e 2 ) = v 3, t(e 3 ) = v 4, t(e 4 ) = v 2, t(e 5 ) = v 5, t(e 6 ) = v 5, t(e 7 ) = v 6, t(e 8 ) = v 6 λ(e 1 ) =, λ(e 2 ) = b, λ(e 3 ) =, λ(e 4 ) =, λ(e 5 ) = b, λ(e 6 ) =, λ(e 7 ) =, λ(e 8 ) = b. Such lbeled grph cn be represented by the following digrm: In drwing lbeled grphs, we will usully omit edge nmes (the e i ), nd sometimes even the node nmes (the v j ). Pths, cycles, nd conctention of pths re defined just s before (tht is, we ignore the lbels). However, we cn now define the spelling of pth. Definition 5.6. Given lbeled directed grph G = (V,E,L,s,t,λ) for ny two nodes u,v V, for ny pth π = (u,e 1...e n,v), the spelling of the pth π is the string of lbels λ(e 1 ) λ(e n ). When n = 0, the spelling of the null pth (u,ǫ,u) is the null string ǫ. For exmple, the spelling of the pth π 3 = (v 1,e 1 e 2 e 3 e 4 e 2 e 3 e 4 e 5 e 6 e 6 e 8,v 6 )

71 5.2. LABELED GRAPHS AND AUTOMATA 71 e 6 e 7 v v 6 5 b e 8 e 5 b v 4 e 4 e 1 v 1 v 2 e 3 e 2 b v 3 Figure 5.2: A lbeled directed grph is bbbb. Every DFA nd every NFA cn be viewed s lbeled grph, in such wy tht the set of spellings of pths from the strt stte to some finl stte is the lnguge ccepted by the utomton in question. Given DFA D = (Q,Σ,δ,q 0,F), where δ: Q Σ Q, we ssocite the lbeled directed grph G D = (V,E,L,s,t,λ) defined s follows: V = Q, E = {(p,,q) q = δ(p,), p,q Q, Σ}, L = Σ, s((p,,q)) = p, t((p,,q)) = q, nd λ((p,,q)) =. Such lbeled grphs hve specil structure tht cn esily be chrcterized. Itisesilyshownthtstringw Σ isinthelngugel(d) = {w Σ δ (q 0,w) F} iff w is the spelling of some pth in G D from q 0 to some finl stte.

72 72 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Similrly, given n NFA N = (Q,Σ,δ,q 0,F), where δ: Q (Σ {ǫ}) 2 Q, we ssocite the lbeled directed grph G N = (V,E,L,s,t,λ) defined s follows: V = Q E = {(p,,q) q δ(p,), p,q Q, Σ {ǫ}}, L = Σ {ǫ}, s((p,,q)) = p, t((p,,q)) = q, λ((p,,q)) =. Remrk: When N hs no ǫ-trnsitions, we cn let L = Σ. Such lbeled grphs hve lso specil structure tht cn esily be chrcterized. Agin, string w Σ is in the lnguge L(N) = {w Σ δ (q 0,w) F } iff w is the spelling of some pth in G N from q 0 to some finl stte. 5.3 The Closure Definition of the Regulr Lnguges Let Σ = { 1,..., m } besome lphbet. We would like to define fmily oflnguges, R(Σ), by singling out some very bsic (tomic) lnguges, nmely the lnguges { 1 },...,{ m }, the empty lnguge, nd the trivil lnguge, {ǫ}, nd then forming more complicted lnguges by repetedly forming union, conctention nd Kleene of previously constructed lnguges. By doing so, we hope to get fmily of lnguges (R(Σ)) tht is closed under union, conctention, nd Kleene. This mens tht for ny two lnguges, L 1,L 2 R(Σ), we lso hve L 1 L 2 R(Σ) nd L 1 L 2 R(Σ), nd for ny lnguge L R(Σ), we hve L R(Σ). Furthermore, we would like R(Σ) to bethe smllest fmily with these properties. How do we chieve this rigorously? First, let us look more closely t wht we men by fmily of lnguges. Recll tht lnguge (over Σ) is ny subset, L, of Σ. Thus, the set of ll lnguges is 2 Σ, the power set of Σ. If Σ is nonempty, this is n uncountble set. Next, we define fmily, L, of lnguges to be ny subset of 2 Σ. This time, the set of fmilies of lnguges is 2 2Σ. This is huge set. We cn use the inclusion reltion on 2 2Σ to define prtil order on fmilies of lnguges. So, L 1 L 2 iff for every lnguge, L, if L L 1 then L L 2. We cn now stte more precisely wht we re trying to do. Consider the following properties for fmily of lnguges, L: (1) We hve { 1 },...,{ m },,{ǫ} L, i.e., L contins the tomic lnguges. (2) For ll L 1,L 2 L, we lso hve L 1 L 2 L. (2b) For ll L 1,L 2 L, we lso hve L 1 L 2 L. (2c) For ll L L, we lso hve L L.

73 5.3. THE CLOSURE DEFINITION OF THE REGULAR LANGUAGES 73 In other words, L is closed under union, conctention nd Kleene. Now, wht we wnt is the smllest (w.r.t. inclusion) fmily of lnguges tht stisfies properties (1) nd(2)()(b)(c). We cn construct such fmily using n inductive definition. This inductive definition constructs sequence of fmilies of lnguges, (R(Σ) n ) n 0, clled the stges of the inductive definition, s follows: R(Σ) 0 = {{ 1 },...,{ m },,{ǫ}}, R(Σ) n+1 = R(Σ) n {L 1 L 2, L 1 L 2, L L 1,L 2,L R(Σ) n }. Then, we define R(Σ) by R(Σ) = n 0R(Σ) n. Thus, lnguge L belongs to R(Σ) iff it belongs L n, for some n 0. For exmple, if Σ = {,b}, we hve R(Σ) 1 = {{},{b},,{ǫ}, {,b},{,ǫ},{b,ǫ}, {b},{b},{},{bb},{},{b} }. Some of the lnguges tht will pper in R(Σ) 2 re: Observe tht {,bb}, {b,b}, {bb}, {bb}, {}{}, {}{b}, {bb}. R(Σ) 0 R(Σ) 1 R(Σ) 2 R(Σ) n R(Σ) n+1 R(Σ), so tht if L R(Σ) n, then L R(Σ) p, for ll p n. Also, there is some smllest n for which L R(Σ) n (the birthdte of L!). In fct, ll these inclusions re strict. Note tht ech R(Σ) n only contins finite number of lnguges (but some of the lnguges in R(Σ) n re infinite, becuse of Kleene ). Then we define the Regulr lnguges, Version 2, s the fmily R(Σ). Of course, it is fr from obvious tht R(Σ) coincides with the fmily of lnguges ccepted bydfa s(ornfa s), wht wecll theregulr lnguges, version 1. However, thisisthecse, nd this cn be demonstrted by giving two lgorithms. Actully, it will be slightly more convenient to define nottion system, the regulr expressions, to denote the lnguges in R(Σ). Then, we will give n lgorithm tht converts regulr expression, R, into n NFA, N R, so tht L R = L(N R ), where L R is the lnguge (in R(Σ)) denoted by R. We will lso give n lgorithm tht converts n NFA, N, into regulr expression, R N, so tht L(R N ) = L(N). But before doing ll this, we should mke sure tht R(Σ) is indeed the fmily tht we re seeking. This is the content of

74 74 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Proposition 5.2. The fmily, R(Σ), is the smllest fmily of lnguges which contins the tomic lnguges { 1 },...,{ m },, {ǫ}, nd is closed under union, conctention, nd Kleene. Proof. There re two things to prove. (i) We need to prove tht R(Σ) hs properties (1) nd (2)()(b)(c). (ii) We need to prove tht R(Σ) is the smllest fmily hving properties (1) nd (2)()(b)(c). (i) Since R(Σ) 0 = {{ 1 },...,{ m },,{ǫ}}, it is obvious tht (1) holds. Next, ssume tht L 1,L 2 R(Σ). This mens tht there re some integers n 1,n 2 0, so tht L 1 R(Σ) n1 nd L 2 R(Σ) n2. Now, it is possible tht n 1 n 2, but if we let n = mx{n 1,n 2 }, s we observed tht R(Σ) p R(Σ) q whenever p q, we re gurnteed tht both L 1,L 2 R(Σ) n. However, by the definition of R(Σ) n+1 (tht s why we defined it this wy!), we hve L 1 L 2 R(Σ) n+1 R(Σ). The sme rgument proves tht L 1 L 2 R(Σ) n+1 R(Σ). Also, if L R(Σ) n, we immeditely hve L R(Σ) n+1 R(Σ). Therefore, R(Σ) hs properties (1) nd (2)()(b)(c). (ii) Let L be ny fmily of lnguges hving properties (1) nd (2)()(b)(c). We need to prove tht R(Σ) L. If we cn prove tht R(Σ) n L, for ll n 0, we re done (since then, R(Σ) = n 0 R(Σ) n L). We prove by induction on n tht R(Σ) n L, for ll n 0. The bse cse n = 0 is trivil, since L hs (1), which sys tht R(Σ) 0 L. Assume inductively tht R(Σ) n L. We need to prove tht R(Σ) n+1 L. Pick ny L R(Σ) n+1. Recll tht R(Σ) n+1 = R(Σ) n {L 1 L 2, L 1 L 2, L L 1,L 2,L R(Σ) n }. If L R(Σ) n, then L L, since R(Σ) n L, by the induction hypothesis. Otherwise, there re three cses: () L = L 1 L 2, where L 1,L 2 R(Σ) n. By the induction hypothesis, R(Σ) n L, so, we get L 1,L 2 L; since L hs 2(), we hve L 1 L 2 L. (b) L = L 1 L 2, where L 1,L 2 R(Σ) n. By the induction hypothesis, R(Σ) n L, so, we get L 1,L 2 L; since L hs 2(b), we hve L 1 L 2 L. (c) L = L 1, where L 1 R(Σ) n. By the induction hypothesis, R(Σ) n L, so, we get L 1 L; since L hs 2(c), we hve L 1 L. Thus, in ll cses, we showed tht L L, nd so, R(Σ) n+1 L, which proves the induction step.

75 5.4. REGULAR EXPRESSIONS 75 Note: given lnguge L my be built up in different wys. For exmple, {,b} = ({} {b} ). Students should study crefully the bove proof. Although simple, it is the prototype of mny proofs ppering in the theory of computtion. 5.4 Regulr Expressions The definition of the fmily of lnguges R(Σ) given in the previous section in terms of n inductive definition is good to prove properties of these lnguges but is it not very convenient to mnipulte them in prcticl wy. To do so, it is better to introduce symbolic nottion system, the regulr expressions. Regulr expressions re certin strings formed ccording to rules tht mimic the inductive rules for constructing the fmilies R(Σ) n. The set of regulr expressions R(Σ) over n lphbet Σ is lnguge defined on n lphbet defined s follows. Given n lphbet Σ = { 1,..., m }, consider the new lphbet = Σ {+,,,(,),,ǫ}. We define the fmily (R(Σ) n ) of lnguges over s follows: R(Σ) 0 = { 1,..., m,,ǫ}, R(Σ) n+1 = R(Σ) n {(R 1 +R 2 ), (R 1 R 2 ), R R 1,R 2,R R(Σ) n }. Then, we define R(Σ) s R(Σ) = n 0R(Σ) n. Note tht every lnguge R(Σ) n is finite. For exmple, if Σ = {,b}, we hve R(Σ) 1 = {,b,,ǫ, (+b),(b+),(+),(b+b),(+ǫ),(ǫ+), (b+ǫ),(ǫ+b),(+ ),( +),(b+ ),( +b), (ǫ+ǫ),(ǫ+ ),( +ǫ),( + ), ( b),(b ),( ),(b b),( ǫ),(ǫ ), (b ǫ),(ǫ b),(ǫ ǫ),( ),( ), (b ),( b),(ǫ ),( ǫ),( ),,b,ǫ, }.

76 76 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Some of the regulr expressions ppering in R(Σ) 2 re: (+(b b)), (( b)+(b )), (( b) b), (( ) (b b)), ( ), (( ) b ), (b b). Definition 5.7. The set R(Σ) is the set of regulr expressions (over Σ). Proposition 5.3. The lnguge R(Σ) is the smllest lnguge which contins the symbols 1,..., m,, ǫ, from, nd such tht (R 1 +R 2 ), (R 1 R 2 ), nd R, lso belong to R(Σ), when R 1,R 2,R R(Σ). For simplicity of nottion, we write insted of Exmples: R = (+b), S = ( b ). (R 1 R 2 ) (R 1 R 2 ). T = ((+b) )((+b) (+b) ). }{{} n 5.5 Regulr Expressions nd Regulr Lnguges Every regulr expression R R(Σ) cn be viewed s the nme, or denottion, of some lnguge L R(Σ). Similrly, every lnguge L R(Σ) is the interprettion (or mening) of some regulr expression R R(Σ). Think of regulr expression R s progrm, nd of L(R) s the result of the execution or evlution, of R by L. This cn be mde rigorous by defining function L: R(Σ) R(Σ). This function is defined recursively s follows: L[ i ] = { i }, L[ ] =, L[ǫ] = {ǫ}, L[(R 1 +R 2 )] = L[R 1 ] L[R 2 ], L[(R 1 R 2 )] = L[R 1 ]L[R 2 ], L[R ] = L[R].

77 5.5. REGULAR EXPRESSIONS AND REGULAR LANGUAGES 77 Proposition 5.4. For every regulr expression R R(Σ), the lnguge L[R] is regulr (version 2), i.e. L[R] R(Σ). Conversely, for every regulr (version 2) lnguge L R(Σ), there is some regulr expression R R(Σ) such tht L = L[R]. Proof. To prove tht L[R] R(Σ) for ll R R(Σ), we prove by induction on n 0 tht if R R(Σ) n, then L[R] R(Σ) n. To prove tht L is surjective, we prove by induction on n 0 tht if L R(Σ) n, then there is some R R(Σ) n such tht L = L[R]. Note: the function L is not injective. Exmple: If R = (+b), S = ( b ), then L[R] = L[S] = {,b}. For simplicity, we often denote L[R] s L R. As exmples, we hve Remrk. If L[(((b)b)+)] = {,bb} L[(((( b) )b) )] = {w {,b} w hs two b s} L[((((( b) )b) ) )] = {w {,b} w hs n even # of b s} L[((((((( b) )b) ) )b) )] = {w {,b} w hs n R = ((+b) )((+b) (+b) ), }{{} n odd # of b s} it cn be shown tht ny miniml DFA ccepting L R hs 2 n+1 sttes. Yet, both ((+b) ) nd ((+b) (+b)) denote lnguges tht cn be ccepted by smll DFA s (of size 2 }{{} n nd n+2). Definition 5.8. Two regulr expressions R,S R(Σ) re equivlent, denoted s R = S, iff L[R] = L[S]. It is immedite tht = is n equivlence reltion. The reltion = stisfies some (nice) identities. For exmple: ((()+b)+c) = (()+(b+c)) (()(b(cc))) = ((()b)(cc)) ( ) =,

78 78 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S nd more generlly ((R 1 +R 2 )+R 3 ) = (R 1 +(R 2 +R 3 )), ((R 1 R 2 )R 3 ) = (R 1 (R 2 R 3 )), (R 1 +R 2 ) = (R 2 +R 1 ), (R R ) = R, R = R. There is n lgorithm to test the equivlence of regulr expressions, but its complexity is exponentil. Such n lgorithm uses the conversion of regulr expression to n NFA, nd the subset construction for converting n NFA to DFA. Then the problem of deciding whether two regulr expressions R nd S re equivlent is reduced to testing whether two DFA D 1 nd D 2 ccept the sme lnguges (the equivlence problem for DFA s). This lst problem is equivlent to testing whether L(D 1 ) L(D 2 ) = nd L(D 2 ) L(D 1 ) =. But L(D 1 ) L(D 2 ) (nd similrly L(D 2 ) L(D 1 )) is ccepted by DFA obtined by the cross-product constructionforthereltivecomplement (withfinlsttesf 1 F 2 ndf 1 F 2 ). Thus, in the end, the equivlence problem for regulr expressions reduces to the problem of testing whether DFA D = (Q,Σ,δ,q 0,F) ccepts the empty lnguge, which is equivlent to Q r F =. This lst problem is rechbility problem in directed grph which is esily solved in polynomil time. It is n open problem to prove tht the problem of testing the equivlence of regulr expressions cnnot be decided in polynomil time. In the next two sections we show the equivlence of NFA s nd regulr expressions, by providing n lgorithm to construct n NFA from regulr expression, nd n lgorithm for constructing regulr expression from n NFA. This will show tht the regulr lnguges Version 1 coincide with the regulr lnguges Version Regulr Expressions nd NFA s Proposition 5.5. There is n lgorithm, which, given ny regulr expression R R(Σ), constructs n NFA N R ccepting L R, i.e., such tht L R = L(N R ). In order to ensure the correctness of the construction s well s to simplify the description of the lgorithm it is convenient to ssume tht our NFA s stisfy the following conditions: 1. Ech NFA hs single finl stte, t, distinct from the strt stte, s. 2. There re no incoming trnsitions into the the strt stte, s, nd no outgoing trnsitions from the finl stte, t.

79 5.6. REGULAR EXPRESSIONS AND NFA S Every stte hs t most two incoming nd two outgoing trnsitions. Here is the lgorithm. For the bse cse, either () R = i, in which cse, N R is the following NFA: s i t Figure 5.3: NFA for i (b) R = ǫ, in which cse, N R is the following NFA: s ǫ t Figure 5.4: NFA for ǫ (c) R =, in which cse, N R is the following NFA: s t Figure 5.5: NFA for The recursive cluses re s follows: (i) If our expression is (R+S), the lgorithm is pplied recursively to R nd S, generting NFA s N R nd N S, nd then these two NFA s re combined in prllel s shown in Figure 5.6: ǫ s 1 N R t 1 ǫ s t ǫ ǫ s 2 N S t 2 Figure 5.6: NFA for (R+S)

80 80 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S (ii) If our expression is (R S), the lgorithm is pplied recursively to R nd S, generting NFA s N R nd N S, nd then these NFA s re combined sequentilly s shown in Figure 5.7 by merging the old finl stte, t 1, of N R, with the old strt stte, s 2, of N S : s 1 N R t 1 N S t 2 Figure 5.7: NFA for (R S) Note tht since there re no incoming trnsitions into s 2 in N S, once we enter N S, there is no wy of reentering N R, nd so the construction is correct (it yields the conctention L R L S ). (iii) If our expression is R, the lgorithm is pplied recursively to R, generting the NFA N R. Then we construct the NFA shown in Figure 5.8 by dding n ǫ-trnsition from the old finl stte, t 1, of N R to the old strt stte, s 1, of N R nd, s ǫ is not necessrily ccepted by N R, we dd n ǫ-trnsition from s to t: ǫ ǫ ǫ s s 1 t 1 t N R ǫ Figure 5.8: NFA for R Since there re no outgoing trnsitions from t 1 in N R, we cn only loop bck to s 1 from t 1 using the new ǫ-trnsition from t 1 to s 1 nd so the NFA of Figure 5.8 does ccept N R. The lgorithm tht we just described is sometimes clled the sombrero construction. As corollry of this construction, we get Reg. lnguges version 2 Reg. lnguges, version 1. The reder should check tht if one constructs the NFA corresponding to the regulr expression (+b) bb nd then pplies the subset construction, one get the following DFA:

81 5.6. REGULAR EXPRESSIONS AND NFA S 81 B b A D b E b C b b Figure 5.9: A non-miniml DFA for {,b} {bb} We now consider the construction of regulr expression from n NFA. Proposition 5.6. There is n lgorithm, which, given ny NFA N, constructs regulr expression R R(Σ), denoting L(N), i.e., such tht L R = L(N). As corollry, Reg. lnguges version 1 Reg. lnguges, version 2. This is the node elimintion lgorithm. The generl ide is to llow more generl lbels on the edges of n NFA, nmely, regulr expressions. Then, such generlized NFA s re simplified by eliminting nodes one t time, nd redjusting lbels. Preprocessing, phse 1: If necessry, we need to dd new strt stte with n ǫ-trnsition to the old strt stte, if there re incoming edges into the old strt stte. If necessry, we need to dd new (unique) finl stte with ǫ-trnsitions from ech of the old finl sttes to the new finl stte, if there is more thn one finl stte or some outgoing edge from ny of the old finl sttes. At the end of this phse, the strt stte, sy s, is source (no incoming edges), nd the finl stte, sy t, is sink (no outgoing edges). Preprocessing, phse 2: We need to fltten prllel edges. For ny pir of sttes (p,q) (p = q is possible), if there re k edges from p to q lbeled u 1,..., u k, then crete single edge lbeled with the regulr expression u 1 + +u k.

82 82 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S For ny pir of sttes (p,q) (p = q is possible) such tht there is no edge from p to q, we put n edge lbeled. At the end of this phse, the resulting generlized NFA is such tht for ny pir of sttes (p,q) (where p = q is possible), there is unique edge lbeled with some regulr expression denoted s R p,q. When R p,q =, this relly mens tht there is no edge from p to q in the originl NFA N. By interpreting ech R p,q s function cll (relly, mcro) to the NFA N p,q ccepting L[R p,q ] (constructed using the previous lgorithm), we cn verify tht the originl lnguge L(N) is ccepted by this new generlized NFA. Node elimintion only pplies if the generlized NFA hs t lest one node distinct from s nd t. Pick ny node r distinct from s nd t. For every pir (p,q) where p r nd q r, replce the lbel of the edge from p to q s indicted below: R r,r r R p,r R r,q p R p,q q Figure 5.10: Before Eliminting node r

83 5.6. REGULAR EXPRESSIONS AND NFA S 83 p R p,q +R p,r Rr,r R r,q q Figure 5.11: After Eliminting node r At the end of this step, delete the node r nd ll edges djcent to r. Note tht p = q is possible, in which cse the tringle is flt. It is lso possible tht p = s or q = t. Also, this step is performed for ll pirs (p,q), which mens tht both (p,q) nd (q,p) re considered (when p q)). Note tht this step only hs n effect if there re edges from p to r nd from r to q in the originl NFA N. Otherwise, r cn simply be deleted, s well s the edges djcent to r. Other simplifictions cn be mde. For exmple, when R r,r =, we cn simplify R p,r R r,r R r,q to R p,r R r,q. When R p,q =, we hve R p,r R r,r R r,q. The order in which the nodes re eliminted is irrelevnt, lthough it ffects the size of the finl expression. The lgorithm stops when the only remining nodes re s nd t. Then, the lbel R of the edge from s to t is regulr expression denoting L(N). For exmple, let L = {w Σ w contins n odd number of s or n odd number of b s}. An NFA for L fter the preprocessing phse is:

84 84 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S 0 ǫ 1 2 b b b b ǫ 3 4 ǫ ǫ 5 Figure 5.12: NFA for L (fter preprocessing phse) After eliminting node 2: 0 ǫ 1 b b b b 3 4 bb ǫ+b ǫ 5 After eliminting node 3: Figure 5.13: NFA for L (fter eliminting node 2)

85 5.6. REGULAR EXPRESSIONS AND NFA S 85 +bb 0 ǫ 1 b+b +b b+b +bb 4 ǫ++b 5 Figure 5.14: NFA for L (fter eliminting node 3) After eliminting node 4: S ǫ T Figure 5.15: NFA for L (fter eliminting node 4) where T = +b+(b+b)(+bb) (ǫ++b) nd S = +bb+(b+b)(+bb) (b+b). Finlly, fter eliminting node 1, we get: R = (+bb+(b+b)(+bb) (b+b)) (+b+(b+b)(+bb) (ǫ++b)).

86 86 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S 5.7 Applictions of Regulr Expressions: Lexicl nlysis, Finding ptterns in text Regulr expressions hve severl prcticl pplictions. The first importnt ppliction is to lexicl nlysis. A lexicl nlyzer is the first component of compiler. The purpose of lexicl nlyzer is to scn the source progrm nd brek it into tomic components, known s tokens, i.e., substrings of consecutive chrcters tht belong together logiclly. Exmples of tokens re: identifiers, keywords, numbers (in fixed point nottion or floting point nottion, etc.), rithmetic opertors (+,,, ^), comprison opertors (<, >, =, <>), ssignment opertor (:=), etc. Tokens cn be described by regulr expressions. For this purpose, it is useful to enrich the syntx of regulr expressions, s in UNIX. For exmple, the26 upper cse letters of the(romn) lphbet, A,...,Z, cn bespecified by the expression [A-Z] Similrly, the ten digits, 0,1,...,9, cn be specified by the expression The regulr expression is denoted So, the expression [0-9] R 1 +R 2 + +R k [R 1 R 2 R k ] [A-Z-z0-9] denotes ny letter (upper cse or lower cse) or digit. This is clled n lphnumeric. If we define n identifier s string beginning with letter (upper cse or lower cse) followed by ny number of lphnumerics (including none), then we cn use the following expression to specify identifiers: [A-Z-z][A-Z-z0-9] There re systems, such s lex or flex tht ccept s input list of regulr expressions describing the tokens of progrmming lnguge nd construct lexicl nlyzer for these tokens.

87 5.8. SUMMARY OF CLOSURE PROPERTIES OF THE REGULAR LANGUAGES 87 Such systems re clled lexicl nlyzer genertors. Bsiclly, they build DFA from the set of regulr expressions using the lgorithms tht hve been described erlier. Usully, it is possible ssocite with every expression some ction to be tken when the corresponding token is recognized Another ppliction of regulr expressions is finding ptterns in text. Using regulr expression, we cn specify vguely defined clss of ptterns. Tke the exmple of street ddress. Most street ddresses end with Street, or Avenue, or Rod or St., or Ave., or Rd.. We cn design regulr expression tht cptures the shpe of most street ddresses nd then convert it to DFA tht cn be used to serch for street ddresses in text. For more on this, see Hopcroft-Motwni nd Ullmn. 5.8 Summry of Closure Properties of the Regulr Lnguges The fmily of regulr lnguges is closed under mny opertions. In prticulr, it is closed under the following opertions listed below. Some of the closure properties re left s homework problem. (1) Union, intersection, reltive complement. (2) Conctention, Kleene, Kleene +. (3) Homomorphisms nd inverse homomorphisms. (4) gsm nd inverse gsm mppings, -trnsductions nd inverse -trnsductions. Another useful opertion is substitution. Given ny two lphbets Σ,, substitution is function, τ: Σ 2, ssigning some lnguge, τ(), to every symbol Σ. A substitution τ: Σ 2 is extended to mp τ: 2 Σ 2 by first extending τ to strings using the following definition τ(ǫ) = {ǫ}, τ(u) = τ(u)τ(), where u Σ nd Σ, nd then to lnguges by letting τ(l) = w Lτ(w),

88 88 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S for every lnguge L Σ. Observe tht homomorphism is specil kind of substitution. A substitution is regulr substitution iff τ() is regulr lnguge for every Σ. The proof of the next proposition is left s homework problem. Proposition 5.7. If L is regulr lnguge nd τ is regulr substitution, then τ(l) is lso regulr. Thus, the fmily of regulr lnguges is closed under regulr substitutions. 5.9 Right-Invrint Equivlence Reltions on Σ The purpose of this section is to give one more chrcteriztion of the regulr lnguges in terms of certin kinds of equivlence reltions on strings. Pushing this chrcteriztion bit further, we will be ble to show how miniml DFA s cn be found. Let D = (Q,Σ,δ,q 0,F) be DFA. The DFA D my be redundnt, for exmple, if there re sttes tht re not ccessible from the strt stte. The set Q r of ccessible or rechble sttes is the subset of Q defined s Q r = {p Q w Σ, δ (q 0,w) = p}. If Q Q r, we cn clenup D bydeleting thesttes inq Q r ndrestricting the trnsition function δ to Q r. This wy, we get n equivlent DFA D r such tht L(D) = L(D r ), where ll the sttes of D r re rechble. From now on, we ssume tht we re deling with DFA s such tht D = D r, clled trim, or rechble. Recllthtnequivlence reltion onsetaisreltionwhichisreflexive, symmetric, nd trnsitive. Given ny A, the set {b A b} is clled the equivlence clss of, nd it is denoted s [], or even s []. Recll tht for ny two elements,b A, [] [b] = iff b, nd [] = [b] iff b. The set of equivlence clsses ssocited with the equivlence reltion is prtition Π of A (lso denoted s A/ ). This mens tht it is fmily of nonempty pirwise disjoint sets whose union is equl to A itself. The equivlence clsses re lso clled the blocks of the prtition Π. The number of blocks in the prtition Π is clled the index of (nd Π). Given ny two equivlence reltions 1 nd 2 with ssocited prtitions Π 1 nd Π 2, 1 2 iff every block of the prtition Π 1 is contined in some block of the prtition Π 2. Then, every block of the prtition Π 2 is the union of blocks of the prtition Π 1, nd we sy tht 1 is refinement of 2 (nd similrly, Π 1 is refinement of Π 2 ). Note tht Π 2 hs t most s mny blocks s Π 1 does.

89 5.9. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 89 We now define n equivlence reltion on strings induced by DFA. This equivlence is kind of observtionl equivlence, in the sense tht we decide tht two strings u,v re equivlent iff, when feeding first u nd then v to the DFA, u nd v drive the DFA to the sme stte. From the point of view of the observer, u nd v hve the sme effect (reching the sme stte). Definition 5.9. Given DFA D = (Q,Σ,δ,q 0,F), we define the reltion D on Σ s follows: for ny two strings u,v Σ, u D v iff δ (q 0,u) = δ (q 0,v). Exmple 5.1. We cn figure out wht the equivlence clsses of D re for the following DFA: b with 0 both strt stte nd (unique) finl stte. For exmple There re three equivlences clsses: bbbbb D bbb D ǫ bb D. [ǫ], [], []. Observe tht L(D) = [ǫ]. Also, the equivlence clsses re in one to one correspondence with the sttes of D. The reltion D turns out to hve some interesting properties. In prticulr, it is rightinvrint, which mens tht for ll u,v,w Σ, if u v, then uw vw. Proposition 5.8. Given ny (trim) DFA D = (Q,Σ,δ,q 0,F), the reltion D is n equivlence reltion which is right-invrint nd hs finite index. Furthermore, if Q hs n sttes, then the index of D is n, nd every equivlence clss of D is regulr lnguge. Finlly, L(D) is the union of some of the equivlence clsses of D. Proof. The fct tht D is n equivlence reltion is trivil verifiction. To prove tht D is right-invrint, we first prove by induction on the length of v tht for ll u,v Σ, for ll p Q, δ (p,uv) = δ (δ (p,u), v).

90 90 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Then, if u D v, which mens tht δ (q 0,u) = δ (q 0,v), we hve δ (q 0,uw) = δ (δ (q 0,u), w) = δ (δ (q 0,v), w) = δ (q 0,vw), which mens tht uw D vw. Thus, D is right-invrint. We still hve to prove tht D hs index n. Define the function f: Σ Q such tht f(u) = δ (q 0,u). Note tht if u D v, which mens tht δ (q 0,u) = δ (q 0,v), then f(u) = f(v). Thus, the function f: Σ Q hs the sme vlue on ll the strings in some equivlence clss [u], so it induces function f: Π Q defined such tht f([u]) = f(u) for every equivlence clss [u] Π, where Π = Σ / is the prtition ssocited with D. This function is well defined since f(v) hs the sme vlue for ll elements v in the equivlence clss [u]. However, the function f: Π Q is injective (one-to-one), since f([u]) = f([v]) is equivlent to f(u) = f(v) (since by definition of f we hve f([u]) = f(u) nd f([v]) = f(v)), which by definition of f mens tht δ (q 0,u) = δ (q 0,v), which mens precisely tht u D v, tht is, [u] = [v]. Since Qhs nsttes, Πhs t most nblocks. Moreover, since every stteis ccessible, for every q Q, thereissomew Σ so thtδ (q 0,w) = q, whichshows tht f([w]) = f(w) = q. Consequently, f is lso surjective. But then, being injective nd surjective, f is bijective nd Π hs exctly n blocks. Every equivlence clss of Π is set of strings of the form {w Σ δ (q 0,w) = p}, for some p Q, which is ccepted by the DFA D p = (Q,Σ,δ,q 0,{p}) obtined from D by chnging F to {p}. Thus, every equivlence clss is regulr lnguge. Finlly, since L(D) = {w Σ δ (q 0,w) F} = f F{w Σ δ (q 0,w) = f} = f F L(D f ), we see tht L(D) is the union of the equivlence clsses corresponding to the finl sttes in F.

91 5.9. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 91 One should not be too optimistic nd hope tht every equivlence reltion on strings is right-invrint. Exmple 5.2. For exmple, if Σ = {}, the equivlence reltion given by the prtition { ǫ,, 4, 9, 16,..., n2,... n 0 } { 2, 3, 5, 6, 7, 8,..., m,... m is not squre } we hve 4, yet by conctenting on the right with 5, since 5 = 6 nd 4 5 = 9 we get 6 9, thtis, 6 nd 9 renot equivlent. Itturnsout thttheproblemisthtneither equivlence clss is regulr lnguge. It is worth noting tht right-invrint equivlence reltion is not necessrily leftinvrint, which mens tht if u v then wu wv. Exmple 5.3. For exmple, if is given by the four equivlence clsses C 1 = {bb}, C 2 = {bb}, C 3 = b{bb}, C 4 = {bb} {,b} + b{bb} {,b}, then we cn check tht is right-invrint by figuring out the inclusions C i C j nd C i b C j, which re recorded in the following tble: b C 1 C 2 C 3 C 2 C 4 C 4 C 3 C 4 C 1 C 4 C 4 C 4 However, both b,b C 4, yet bb C 4 nd bb C 2, so is not left-invrint. The remrkble fct due to Myhill nd Nerode is tht Proposition 5.8 hs converse. Indeed, given right-invrint equivlence reltion of finite index it is possible to reconstruct DFA, nd by suitble choice of finl stte, every equivlence clss is ccepted by such DFA. Let us show how this DFA is constructed using simple exmple. Exmple 5.4. Consider the equivlence reltion on {,b} given by the three equivlence clsses C 1 = {ǫ}, C 2 = {,b}, C 3 = b{,b}. We leve it s n esy exercise to check tht is right-invrint. For exmple, if u v nd u,v C 2, then u = x nd v = y for some x,y {,b}, so for ny w {,b} we hve uw = xw nd vw = yw, which mens tht we lso hve uw,vw C 2, thus uw vw.

92 92 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S For ny subset C {,b} nd ny string w {,b} define Cw s the set of strings Cw = {uw u C}. There re two resons why DFA cn be recovered from the right-invrint equivlence reltion : (1) For every equivlence clss C i nd every string w, there is unique equivlence clss C j such tht C i w C j. Actully, itisenoughtocheck thebovepropertyforstringsw oflength1(i.e. symbols in the lphbet) becuse the property for rbitrry strings follows by induction. (2) For every w Σ nd every clss C i, C 1 w C i iff w C i, where C 1 is the equivlence clss of the empty string. We cn mke tble recording these inclusions. Exmple 5.5. Continuing Exmple 5.4, we get: b C 1 C 2 C 3 C 2 C 2 C 2 C 3 C 3 C 3 For exmple, from C 1 = {ǫ} we hve C 1 = {} C 2 nd C 1 b = {b} C 3, for C 2 = {,b}, we hve C 2 = {,b} C 2 nd C 2 = {,b} b C 2, nd for C 3 = b{,b}, we hve C 3 = b{,b} C 3 nd C 3 b = b{,b} b C 3. The key point is tht the bove tble is the trnsition tble of DFA with strt stte C 1 = [ǫ]. Furthermore, if C i (i = 1,2,3) is chosen s single finl stte, the corresponding DFA D i ccepts C i. This is the converse of Myhill-Nerode! Observe tht the inclusions C i w C j my be strict inclusions. For exmple, C 1 = {} is proper subset of C 2 = {,b} Let us do nother exmple. Exmple 5.6. Consider the equivlence reltion given by the four equivlence clsses C 1 = {ǫ}, C 2 = {}, C 3 = {b} +, C 4 = {,b} + {b} + {,b}.

93 5.9. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 93 We leve it s n esy exercise to check tht is right-invrint. We obtin the following tble of inclusions C i C j nd C i b C j : b C 1 C 2 C 3 C 2 C 4 C 4 C 3 C 4 C 3 C 4 C 4 C 4 For exmple, from C 3 = {b} + we get C 3 = {b} + C 4, nd C 3 b = {b} + b C 3. The bove tble is the trnsition function of DFA with four sttes nd strt stte C 1. If C i (i = 1,2,3,4) is chosen s single finl stte, the corresponding DFA D i ccepts C i. Here is the generl result. Proposition 5.9. Given ny equivlence reltion on Σ, if is right-invrint nd hs finite index n, then every equivlence clss (block) in the prtition Π ssocited with is regulr lnguge. Proof. Let C 1,...,C n be the blocks of Π, nd ssume tht C 1 = [ǫ] is the equivlence clss of the empty string. First, we clim tht for every block C i nd every w Σ, there is unique block C j such tht C i w C j, where C i w = {uw u C i }. For every u C i, the string uw belongs to one nd only one of the blocks of Π, sy C j. For ny other string v C i, since (by definition) u v, by right invrince, we get uw vw, but since uw C j nd C j is n equivlence clss, we lso hve vw C j. This proves the first clim. We lso clim tht for every w Σ, for every block C i, C 1 w C i iff w C i. If C 1 w C i, since C 1 = [ǫ], we hve ǫw = w C i. Conversely, if w C i, for ny v C 1 = [ǫ], since ǫ v, by right invrince we hve w vw, nd thus vw C i, which shows tht C 1 w C i. For every clss C k, let D k = ({1,...,n},Σ,δ,1,{k}), where δ(i,) = j iff C i C j. We will prove the following equivlence: δ (i,w) = j iff C i w C j. For this, we prove the following two implictions by induction on w :

94 94 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S () If δ (i,w) = j, then C i w C j, nd (b) If C i w C j, then δ (i,w) = j. The bse cse (w = ǫ) is trivil for both () nd (b). We leve the proof of the induction step for () s n exercise nd give the proof of the induction step for (b) becuse it is more subtle. Let w = u, with Σ nd u Σ. If C i u C j, then by the first clim, we know tht there is unique block, C k, such tht C i u C k. Furthermore, there is unique block, C h, such tht C k C h, but C i u C k implies C i u C k so we get C i u C h. However, by the uniqueness of the block, C j, such tht C i u C j, we must hve C h = C j. By the induction hypothesis, s C i u C k, we hve δ (i,u) = k nd, by definition of δ, s C k C j (= C h ), we hve δ(k,) = j, so we deduce tht δ (i,u) = δ(δ (i,u),) = δ(k,) = j, s desired. Then, using the equivlence just proved nd the second clim, we hve L(D k ) = {w Σ δ (1,w) {k}} = {w Σ δ (1,w) = k} = {w Σ C 1 w C k } = {w Σ w C k } = C k, proving tht every block, C k, is regulr lnguge. In generl it is flse tht C i = C j for some block C j, nd we cn only clim tht C i C j. We cn combine Proposition 5.8 nd Proposition 5.9 to get the following chrcteriztion of regulr lnguge due to Myhill nd Nerode: Theorem (Myhill-Nerode) A lnguge L (over n lphbet Σ) is regulr lnguge iff it is the union of some of the equivlence clsses of n equivlence reltion on Σ, which is right-invrint nd hs finite index. Theorem 5.10 cn lso be used to prove tht certin lnguges re not regulr. A generl scheme (not the only one) goes s follows: If L is not regulr, then it must be infinite. Now, we rgue by contrdiction. If L ws regulr, then by Myhill-Nerode, there would be some equivlence reltion, which is right-invrint nd of finite index, nd such tht L is the union of some of the clsses of. Becuse Σ is infinite nd hs only finitely mny equivlence clsses, there re strings x,y Σ with x y so tht x y.

95 5.9. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 95 If we cn find third string, z Σ, such tht xz L nd yz / L, then we rech contrdiction. Indeed, by right invrince, from x y, we get xz yz. But, L is the union of equivlence clsses of, so if xz L, then we should lso hve yz L, contrdicting yz / L. Therefore, L is not regulr. Then the scenrio is this: to prove tht L is not regulr, first we check tht L is infinite. If so, we try finding three strings x,y,z, where nd x nd y x re prefixes of strings in L such tht x y, where is right-invrint reltion of finite index such tht L is the union of equivlence of L (which must exist by Myhill Nerode since we re ssuming by contrdiction tht L is regulr), nd where z is chosen so tht xz L nd yz L. Exmple 5.7. For exmple, we prove tht L = { n b n n 1} is not regulr. Assuming for the ske of contrdiction tht L is regulr, there is some equivlence reltion which is right-invrint nd of finite index nd such tht L is the union of some of the clsses of. Since the sequence,,,..., i,... is infinite nd hs finite number of clsses, two of these strings must belong to the sme clss, which mens tht i j for some i j. But since is right invrint, by conctenting with b i on the right, we see tht i b i j b i for some i j. However i b i L, nd since L is the union of clsses of, we lso hve j b i L for i j, which is bsurd, given the definition of L. Thus, in fct, L is not regulr. Here is nother illustrtion of the use of the Myhill-Nerode Theorem to prove tht lnguge is not regulr. Exmple 5.8. We clim tht the lnguge, L = { n! n 1}, is not regulr, where n! (n fctoril) is given by 0! = 1 nd (n+1)! = (n+1)n!. Assume L is regulr. Then, there is some equivlence reltion which is right-invrint nd of finite index nd such tht L is the union of some of the clsses of. Since the sequence, 2,..., n,...

96 96 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S is infinite, two of these strings must belong to the sme clss, which mens tht p q for some p,q with 1 p < q. As q! q for ll q 0 nd q > p, we cn conctente on the right with q! p nd we get p q! p q q! p, tht is, q! q!+q p. Since p < q we hve q! < q!+q p. If we cn show tht q!+q p < (q +1)! we will obtin contrdiction becuse then q!+q p / L, yet q!+q p q! nd q! L, contrdicting Myhill-Nerode. Now, s 1 p < q, we hve q p q 1, so if we cn prove tht q!+q p q!+q 1 < (q +1)! we will be done. However, q!+q 1 < (q +1)! is equivlent to q 1 < (q+1)! q!, nd since (q +1)! q! = (q +1)q! q! = qq!, we simply need to prove tht which holds for q 1. q 1 < q qq!, There is nother version of the Myhill-Nerode Theorem involving congruences which is lso quite useful. An equivlence reltion,, on Σ is left nd right-invrint iff for ll x,y,u,v Σ, if x y then uxv uyv. An equivlence reltion,, on Σ is congruence iff for ll u 1,u 2,v 1,v 2 Σ, if u 1 v 1 nd u 2 v 2 then u 1 u 2 v 1 v 2. It is esy to prove tht n equivlence reltion is congruence iff it is left nd rightinvrint. For exmple, ssume tht is left nd right-invrint equivlence reltion, nd ssume tht u 1 v 1 nd u 2 v 2. By right-invrince pplied to u 1 v 1, we get u 1 u 2 v 1 u 2

97 5.9. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 97 nd by left-invrince pplied to u 2 v 2 we get By trnsitivity, we conclude tht which shows tht is congruence. v 1 u 2 v 1 v 2. u 1 u 2 v 1 v 2. Proving tht congruence is left nd right-invrint is even esier. There is version of Proposition 5.8 tht pplies to congruences nd for this we define the reltion D s follows: For ny (trim) DFA, D = (Q,Σ,δ,q 0,F), for ll x,y Σ, x D y iff ( q Q)(δ (q,x) = δ (q,y)). Proposition Given ny (trim) DFA, D = (Q,Σ,δ,q 0,F), the reltion D is n equivlence reltion which is left nd right-invrint nd hs finite index. Furthermore, if Q hs n sttes, then the index of D is t most n n nd every equivlence clss of D is regulr lnguge. Finlly, L(D) is the union of some of the equivlence clsses of D. Proof. We leve most of the proof of Proposition 5.11 s n exercise. The lst two prts of the proposition re proved using the following fcts: (1) Since D is left nd right-invrint nd hs finite index, in prticulr, D is rightinvrint nd hs finite index, so by Proposition 5.9 every equivlence clss of D is regulr. (2) Observe tht D D, since the condition δ (q,x) = δ (q,y) holds for every q Q, so in prticulr for q = q 0. But then, every equivlence clss of D is the union of equivlence clsses of D nd since, by Proposition 5.8, L is the union of equivlence clsses of D, we conclude tht L is lso the union of equivlence clsses of D. This completes the proof. Using Proposition 5.11 nd Proposition 5.9, we obtin nother version of the Myhill- Nerode Theorem. Theorem (Myhill-Nerode, Congruence Version) A lnguge L (over n lphbet Σ) is regulr lnguge iff it is the union of some of the equivlence clsses of n equivlence reltion on Σ, which is congruence nd hs finite index. We now consider n equivlence reltion ssocited with lnguge L.

98 98 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S 5.10 Finding miniml DFA s Given ny lnguge L (not necessrily regulr), we cn define n equivlence reltion ρ L on Σ which is right-invrint, but not necessrily of finite index. The equivlence reltion ρ L is such tht L is the union of equivlence clsses of ρ L. Furthermore, when L is regulr, the reltion ρ L hs finite index. In fct, this index is the size of smllest DFA ccepting L. As consequence, if L is regulr, simple modifiction of the proof of Proposition 5.9 pplied to = ρ L yields miniml DFA D ρl ccepting L. Then, given ny trim DFA D ccepting L, the equivlence reltion ρ L cn be trnslted to n equivlence reltion on sttes, in such wy tht for ll u,v Σ, uρ L v iff ϕ(u) ϕ(v), where ϕ: Σ Q is the function (run the DFA D on u from q 0 ) given by ϕ(u) = δ (q 0,u). One cn then construct quotient DFA D/ whose sttes re obtined by merging ll sttes in given equivlence clss of sttes into single stte, nd the resulting DFA D/ is mininl DFA. Even though D/ ppers to depend on D, it is in fct unique, nd isomorphic to the bstrct DFA D ρl induced by ρ L. The lst step in obtining the miniml DFA D/ is to give constructive method to compute the stte equivlence reltion. This cn be done by constructing sequence of pproximtions i, where ech i+1 refines i. It turns out tht if D hs n sttes, then there is some index i 0 n 2 such tht nd tht j = i0 for llj i 0 +1, = i0. Furthermore, i+1 cn be computed inductively from i. In summry, we obtin itertive lgorithm for computing tht termintes in t most n 2 steps. Definition Given ny lnguge L (over Σ), we define the right-invrint equivlence ρ L ssocited with L s the reltion on Σ defined s follows: for ny two strings u,v Σ, uρ L v iff w Σ (uw L iff vw L). It is cler tht the reltion ρ L is n equivlence reltion, nd it is right-invrint. To show right-invrince, rgue s follows: if uρ L v, then for ny w Σ, since uρ L v mens tht uz L iff vz L

99 5.10. FINDING MINIMAL DFA S 99 for ll z Σ, in prticulr the bove equivlence holds for ll z of the form z = wy for ny rbitry y Σ, so we hve uwy L iff vwy L for ll y Σ, which mens tht uwρ L vw. It is lso cler tht L is the union of the equivlence clsses of strings in L. This is becuse if u L nd uρ L v, by letting w = ǫ in the definition of ρ L, we get u L iff v L, nd since u L, we lso hve v L. This implies tht if u L then [u] ρl L nd so, L = u L [u] ρl. Exmple 5.9. For exmple, consider the regulr lnguge L = {} {b m m 1}. We leve it s n exercise to show tht the equivlence reltion ρ L consists of the four equivlence clsses C 1 = {ǫ}, C 2 = {}, C 3 = {b} +, C 4 = {,b} + {b} + {,b} encountered erlier in Exmple 5.6. Observe tht L = C 2 C 3. When L is regulr, we hve the following remrkble result: Proposition Given ny regulr lnguge L, for ny (trim) DFA D = (Q,Σ,δ,q 0,F) such tht L = L(D), ρ L is right-invrint equivlence reltion, nd we hve D ρ L. Furthermore, if ρ L hs m clsses nd Q hs n sttes, then m n. Proof. By definition, u D v iff δ (q 0,u) = δ (q 0,v). Since w L(D) iff δ (q 0,w) F, the fct tht uρ L v cn be expressed s w Σ (uw L iff vw L) iff w Σ (δ (q 0,uw) F iff δ (q 0,vw) F) iff w Σ (δ (δ (q 0,u),w) F iff δ (δ (q 0,v),w) F), nd if δ (q 0,u) = δ (q 0,v), this shows tht uρ L v. Since the number of clsses of D is n nd D ρ L, the equivlence reltion ρ L hs fewer clsses thn D, nd m n.

100 100 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Proposition 5.13 shows tht when L is regulr, the index m of ρ L is finite, nd it is lower bound on the size of ll DFA s ccepting L. It remins to show tht DFA with m sttes ccepting L exists. However, going bck to the proof of Proposition 5.9 strting with the right-invrint equivlence reltion ρ L of finite index m, if L is the union of the clsses C i1,...,c ik, the DFA D ρl = ({1,...,m},Σ,δ,1,{i 1,...,i k }), where δ(i,) = j iff C i C j, is such tht L = L(D ρl ). In summry, if L is regulr, then the index of ρ L is equl to the number of sttes of miniml DFA for L, nd D ρl is miniml DFA ccepting L. Exmple For exmple, if L = {} {b m m 1}. then we sw in Exmple 5.9 tht ρ L consists of the four equivlence clsses C 1 = {ǫ}, C 2 = {}, C 3 = {b} +, C 4 = {,b} + {b} + {,b}, nd we showed in Exmple 5.6 tht the trnsition tble of D ρl is given by b C 1 C 2 C 3 C 2 C 4 C 4 C 3 C 4 C 3 C 4 C 4 C 4 By picking the finl sttes to be C 2 nd C 3, we obtin the miniml DFA D ρl ccepting L = {} {b m m 1}. In the next section, we give n lgorithm which llows us to find D ρl, given ny DFA D ccepting L. This lgorithms finds which sttes of D re equivlent Stte Equivlence nd Miniml DFA s The proof of Proposition 5.13 suggests the following definition of n equivlence between sttes: Definition Given ny DFA D = (Q,Σ,δ,q 0,F), the reltion on Q, clled stte equivlence, is defined s follows: for ll p,q Q, p q iff w Σ (δ (p,w) F iff δ (q,w) F). ( ) When p q, we sy tht p nd q re indistinguishble.

101 5.11. STATE EQUIVALENCE AND MINIMAL DFA S 101 It is trivil to verify tht is n equivlence reltion, nd tht it stisfies the following property: if p q then δ(p,) δ(q,), for ll Σ. To prove the bove, since the condition defining must hold for ll strings w Σ, in prticulr it must hold for ll strings of the form w = u with Σ nd u Σ, so if p q then we hve ( Σ)( u Σ )(δ (p,u) F iff δ (q,u) F) iff ( Σ)( u Σ )(δ (δ (p,),u) F iff δ (δ (q,),u) F) iff ( Σ)( u Σ )(δ (δ(p,),u) F iff δ (δ(q,),u) F) iff ( Σ)(δ(p,) δ(q,)). δ (p,ǫ) F iff δ (q,ǫ) F, which, since δ (p,ǫ) = p nd δ (q,ǫ) = q, is equivlent to p F iff q F. Therefore, if two sttes p,q re equivlent, then either both p,q F or both p,q F. This implies tht finl stte nd rejecting sttes re never equivlent. Exmple The reder should check tht sttes A nd C in the DFA below re equivlent nd tht no other distinct sttes re equivlent. B b A D b E b C b b Figure 5.16: A non-miniml DFA for {,b} {bb} It is illuminting to express stte equivlence s the equlity of two lnguges. Given the DFA D = (Q,Σ,δ,q 0,F), let D p = (Q,Σ,δ,p,F) be the DFA obtined from D by redefining the strt stte to be p. Then, it is cler tht p q iff L(D p ) = L(D q ).

102 102 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S This simple observtion implies tht there is n lgorithm to test stte equivlence. Indeed, we simply hve to test whether the DFA s D p nd D q ccept the sme lnguge nd this cn be done using the cross-product construction. Indeed, L(D p ) = L(D q ) iff L(D p ) L(D q ) = nd L(D q ) L(D p ) =. Now, if (D p D q ) 1 2 denotes the cross-product DFA with strting stte (p,q) nd with finl sttes F (Q F) nd (D p D q ) 2 1 denotes the cross-product DFA lso with strting stte (p,q) nd with finl sttes (Q F) F, we know tht L((D p D q ) 1 2 ) = L(D p ) L(D q ) nd L((D p D q ) 2 1 ) = L(D q ) L(D p ), so ll we need to do if to test whether (D p D q ) 1 2 nd (D p D q ) 2 1 ccept the empty lnguge. However, we know tht this is the cse iff the set of sttes rechble from (p,q) in (D p D q ) 1 2 contins no stte in F (Q F) nd the set of sttes rechble from (p,q) in (D p D q ) 2 1 contins no stte in (Q F) F. Actully, the grphs of (D p D q ) 1 2 nd (D p D q ) 2 1 re identicl, so we only need to check tht no stte in (F (Q F)) ((Q F) F) is rechble from (p,q) in tht grph. This lgorithm to test stte equivlence is not the most efficient but it is quite resonble (it runs in polynomil time). If L = L(D), Theorem 5.14 below shows the reltionship between ρ L nd nd, more generlly, between the DFA, D ρl, nd the DFA, D/, obtined s the quotient of the DFA D modulo the equivlence reltion on Q. The miniml DFA D/ is obtined by merging the sttes in ech block C i of the prtition Π ssocited with, forming sttes corresponding to the blocks C i, nd drwing trnsition on input from block C i to block C j of Π iff there is trnsition q = δ(p,) from ny stte p C i to ny stte q C j on input. The strt stte is the block contining q 0, nd the finl sttes re the blocks consisting of finl sttes. Exmple For exmple, consider the DFA D 1 ccepting L = {b,b} shown in Figure This is not miniml DFA. In fct, Here is the miniml DFA for L: 0 2 nd 3 5. The miniml DFA D 2 is obtined by merging the sttes in the equivlence clss {0,2} into single stte, similrly merging the sttes in the equivlence clss {3,5} into single stte, nd drwing the trnsitions between equivlence clsses. We obtin the DFA shown in Figure 5.18.

103 5.11. STATE EQUIVALENCE AND MINIMAL DFA S b b b 3 b 4 b 5,b Figure 5.17: A nonminiml DFA D 1 for L = {b,b} 0,2 1 b b 3,5 b 4,b Figure 5.18: A miniml DFA D 2 for L = {b,b} Formlly, the quotient DFA D/ is defined such tht D/ = (Q/,Σ,δ/,[q 0 ],F/ ), where δ/ ([p],) = [δ(p,)]. Theorem For ny (trim) DFA D = (Q,Σ,δ,q 0,F) ccepting the regulr lnguge L = L(D), the function ϕ: Σ Q defined such tht stisfies the property ϕ(u) = δ (q 0,u) uρ L v iff ϕ(u) ϕ(v) for ll u,v Σ, nd induces bijection ϕ: Σ /ρ L Q/, defined such tht ϕ([u] ρl ) = [δ (q 0,u)].

104 104 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Furthermore, we hve [u] ρl [v] ρl iff δ(ϕ(u),) ϕ(v). Consequently, ϕ induces n isomorphism of DFA s, ϕ: D ρl D/. Proof. Since ϕ(u) = δ (q 0,u) nd ϕ(v) = δ (q 0,v), the fct tht ϕ(u) ϕ(v) cn be expressed s w Σ (δ (δ (q 0,u),w) F iff δ (δ (q 0,v),w) F) iff which is exctly uρ L v. Therefore, w Σ (δ (q 0,uw) F iff δ (q 0,vw) F), uρ L v iff ϕ(u) ϕ(v). From the bove, we see tht the equivlence clss [ϕ(u)] of ϕ(u) does not depend on the choice of the representtive in the equivlence clss [u] ρl of u Σ, since for ny v Σ, if uρ L v then ϕ(u) ϕ(v), so [ϕ(u)] = [ϕ(v)]. Therefore, the function ϕ: Σ Q mps ech equivlence clss [u] ρl modulo ρ L to the equivlence clss [ϕ(u)] modulo, nd so the function ϕ: Σ /ρ L Q/ given by ϕ([u] ρl ) = [ϕ(u)] = [δ (q 0,u)] is well-defined. Moreover, ϕ is injective, since ϕ([u]) = ϕ([v]) iff ϕ(u) ϕ(v) iff (from bove) uρ v v iff [u] = [v]. Since every stte in Q is ccessible, for every q Q, there is some u Σ so tht ϕ(u) = δ (q 0,u) = q, so ϕ([u]) = [q] nd ϕ is surjective. Therefore, we hve bijection ϕ: Σ /ρ L Q/. Since ϕ(u) = δ (q 0,u), we hve δ(ϕ(u),) = δ(δ (q 0,u),) = δ (q 0,u) = ϕ(u), nd thus, δ(ϕ(u),) ϕ(v) cn be expressed s ϕ(u) ϕ(v). By the previous prt, this is equivlent to uρ L v, nd we clim tht this is equivlent to [u] ρl [v] ρl. First, if [u] ρl [v] ρl, then u [v] ρl, tht is, uρ L v. Conversely, if uρ L v, then for every u [u] ρl, we hve u ρ L u, so by right-invrince we get u ρ L u, nd since uρ L v, we get u ρ L v, tht is, u [v] ρl. Since u [u] ρl is rbitrry, we conclude tht [u] ρl [v] ρl. Therefore, we proved tht δ(ϕ(u),) ϕ(v) iff [u] ρl [v] ρl. The bove shows tht the trnsitions of D ρl correspond to the trnsitions of D/.

105 5.11. STATE EQUIVALENCE AND MINIMAL DFA S 105 Theorem 5.14 shows tht the DFA D ρl is isomorphic to the DFA D/ obtined s the quotient of the DFA D modulo the equivlence reltion on Q. Since D ρl is miniml DFA ccepting L, so is D/. Exmple Consider the following DFA D, b with strt stte 1 nd finl sttes 2 nd 3. It is esy to see tht L(D) = {} {b m m 1}. It is not hrd to check tht sttes 4 nd 5 re equivlent, nd no other pirs of distinct sttes re equivlent. The quotient DFA D/ is obtined my merging sttes 4 nd 5, nd we obtin the following miniml DFA: b with strt stte 1 nd finl sttes 2 nd 3. This DFA is isomorphic to the DFA D ρl of Exmple There re other chrcteriztions of the regulr lnguges. Among those, the chrcteriztion in terms of right derivtives is of prticulr interest becuse it yields n lterntive construction of miniml DFA s. Definition Given ny lnguge, L Σ, for ny string, u Σ, the right derivtive of L by u, denoted L/u, is the lnguge L/u = {w Σ uw L}. Theorem If L Σ is ny lnguge, then L is regulr iff it hs finitely mny right derivtives. Furthermore, if L is regulr, then ll its right derivtives re regulr nd their number is equl to the number of sttes of the miniml DFA s for L.

106 106 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Proof. It is esy to check tht L/u = L/v iff uρ L v. The bove shows tht ρ L hs finite number of clsses, sy m, iff there is finite number of right derivtives, sy n, nd if so, m = n. If L is regulr, then we know tht the number of equivlence clsses of ρ L is the number of sttes of the miniml DFA s for L, so the number of right derivtives of L is equl to the size of the miniml DFA s for L. Conversely, if the number of derivtives is finite, sy m, then ρ L hs m clsses nd by Myhill-Nerode, Lisregulr. It reminstoshowthtiflisregulrthenevery rightderivtive is regulr. Let D = (Q,Σ,δ,q 0,F) be DFA ccepting L. If p = δ (q 0,u), then let D p = (Q,Σ,δ,p,F), tht is, D with with p s strt stte. It is cler tht L/u = L(D p ), so L/u is regulr for every u Σ. Also observe tht if Q = n, then there re t most n DFA s D p, so there is t most n right derivtives, which is nother proof of the fct tht regulr lnguge hs finite number of right derivtives. If L is regulr then the construction of miniml DFA for L cn be recst in terms of right derivtives. Let L/u 1,L/u 2,...,L/u m be the set of ll the right derivtives of L. Of course, we my ssume tht u 1 = ǫ. We form DFA whose sttes re the right derivtives, L/u i. For every stte, L/u i, for every Σ, there is trnsition on input from L/u i to L/u j = L/(u i ). The strt stte is L = L/u 1 nd the finl sttes re the right derivtives, L/u i, for which ǫ L/u i. We leve it s n exercise to check tht the bove DFA ccepts L. One wy to do this is to recll tht L/u = L/v iff uρ L v nd to observe tht the bove construction mimics the construction of D ρl s in the Myhill-Nerode proposition (Proposition 5.9). This DFA is miniml since the number of right derivtives is equl to the size of the miniml DFA s for L. We now return to stte equivlence. Note tht if F =, then hs single block (Q), nd if F = Q, then hs single block (F). In the first cse, the miniml DFA is the one stte DFA rejecting ll strings. In the second cse, the miniml DFA is the one stte DFA ccepting ll strings. When F nd F Q, there re t lest two sttes in Q, nd lso hs t lest two blocks, s we shll see shortly. It remins to compute explicitly. This is done using sequence of pproximtions. In view of the previous discussion, we re ssuming tht F nd F Q, which mens tht n 2, where n is the number of sttes in Q.

107 5.11. STATE EQUIVALENCE AND MINIMAL DFA S 107 Definition Given ny DFA D = (Q,Σ,δ,q 0,F), for every i 0, the reltion i on Q, clled i-stte equivlence, is defined s follows: for ll p,q Q, p i q iff w Σ, w i(δ (p,w) F iff δ (q,w) F). When p i q, we sy tht p nd q re i-indistinguishble. Since stte equivlence is defined such tht we note tht testing the condition p q iff w Σ (δ (p,w) F iff δ (q,w) F), δ (p,w) F iff δ (q,w) F for ll strings in Σ is equivlent to testing the bove condition for ll strings of length t most i for ll i 0, i.e. p q iff i 0 w Σ, w i(δ (p,w) F iff δ (q,w) F). Since i is defined such tht we conclude tht p i q iff w Σ, w i(δ (p,w) F iff δ (q,w) F), This identity cn lso be expressed s p q iff i 0(p i q). = i 0 i. If we ssume tht F nd F Q, observe tht 0 hs exctly two equivlence clsses F nd Q F, since ǫ is the only string of length 0, nd since the condition is equivlent to the condition δ (p,ǫ) F iff δ (q,ǫ) F p F iff q F. It is lso obvious from the definition of i tht i+1 i 1 0. If this sequence ws strictly decresing for ll i 0, the prtition ssocited with i+1 would contin t lest one more block thn the prtition ssocited with i nd since we strt with prtition with two blocks, the prtition ssocited with i would hve t lest i+2 blocks.

108 108 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S But then, for i = n 1, the prtition ssocited with n 1 would hve t lest n+1 blocks, which is bsurd since Q hs only n sttes. Therefore, there is smllest integer, i 0 n 2, such tht i0 +1 = i0. Thus, it remins to compute i+1 from i, which cn be done using the following proposition: The proposition lso shows tht = i0. Proposition For ny (trim) DFA D = (Q,Σ,δ,q 0,F), for ll p,q Q, p i+1 q iff p i q nd δ(p,) i δ(q,), for every Σ. Furthermore, if F nd F Q, there is smllest integer i 0 n 2, such tht Proof. By the definition of the reltion i, i0 +1 = i0 =. p i+1 q iff w Σ, w i+1(δ (p,w) F iff δ (q,w) F). The trick is to observe tht the condition δ (p,w) F iff δ (q,w) F holds for ll strings of length t most i+1 iff it holds for ll strings of length t most i nd for ll strings of length between 1 nd i+1. This is expressed s p i+1 q iff w Σ, w i(δ (p,w) F iff δ (q,w) F) nd w Σ, 1 w i+1(δ (p,w) F iff δ (q,w) F). Obviously, the first condition in the conjunction is p i q, nd since every string w such tht 1 w i+1 cn be written s u where Σ nd 0 u i, the second condition in the conjunction cn be written s Σ u Σ, u i(δ (p,u) F iff δ (q,u) F). However, δ (p,u) = δ (δ(p,),u) nd δ (q,u) = δ (δ(q,),u), so tht the bove condition is relly Σ(δ(p,) i δ(q,)). Thus, we showed tht p i+1 q iff p i q nd Σ(δ(p,) i δ(q,)).

109 5.11. STATE EQUIVALENCE AND MINIMAL DFA S 109 We clim tht if i+1 = i for some i 0, then i+j = i for ll j 1. This clim is proved by induction on j. For the bse cse j, the clim is tht i+1 = i, which is the hypothesis. Assume inductively tht i+j = i for ny j 1. Since p i+j+1 q iff p i+j q nd δ(p,) i+j δ(q,), for every Σ, nd since by the induction hypothesis i+j = i, we obtin p i+j+1 q iff p i q nd δ(p,) i δ(q,), for every Σ, which is equivlent to p i+1 q, nd thus i+j+1 = i+1. But i+1 = i, so i+j+1 = i, estblishing the induction step. Since = i 0 i, i+1 i, nd since we know tht there is smllest index sy i 0, such tht j = i0, for ll j i 0 +1, we hve = i 0 i=0 i = i0. Using Proposition5.16, we cncompute inductively, strtingfrom 0 = (F,Q F), nd computing i+1 from i, until the sequence of prtitions ssocited with the i stbilizes. Note tht if F = Q or F =, then = 0, nd the inductive chrcteriztion of Proposition 5.16 holds trivilly. There re number of lgorithms for computing, or to determine whether p q for some given p,q Q. A simple method to compute is described in Hopcroft nd Ullmn. The bsic ide is to propgte inequivlence, rther thn equivlence. The method consists in forming tringulr rry corresponding to ll unordered pirs (p,q), with p q (the rows nd the columns of this tringulr rry re indexed by the sttes in Q, where the entries re below the descending digonl). Initilly, the entry (p, q) is mrked iff p nd q re not 0-equivlent, which mens tht p nd q re not both in F or not both in Q F. Then, we process every unmrked entry on every row s follows: for ny unmrked pir (p,q), we consider pirs (δ(p,),δ(q,)), for ll Σ. If ny pir (δ(p,),δ(q,)) is lredy mrked, this mens tht δ(p,) nd δ(q,) re inequivlent, nd thus p nd q re inequivlent, nd we mrk the pir (p,q). We continue in this fshion, until t the end of round during which ll the rows re processed, nothing hs chnged. When the lgorithm stops, ll mrked pirs re inequivlent, nd ll unmrked pirs correspond to equivlent sttes. Let us illustrtes the bove method. Exmple Consider the following DFA ccepting {,b} {bb}:

110 110 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S b A B C B B D C B C D B E E B C The strt stte is A, nd the set of finl sttes is F = {E}. (This is the DFA displyed in Figure 5.9.) The initil (hlf) rry is s follows, using to indicte tht the corresponding pir (sy, (E, A)) consists of inequivlent sttes, nd to indicte tht nothing is known yet. After the first round, we hve After the second round, we hve B C D E B C A B C D D E B A B C D C D E A B C D Finlly, nothing chnges during the third round, nd thus, only A nd C re equivlent, nd we get the four equivlence clsses ({A,C},{B},{D},{E}).

111 5.12. THE PUMPING LEMMA 111 We obtin the miniml DFA showed in Figure b b 0 b b Figure 5.19: A miniml DFA ccepting {,b} {bb} There re wys of improving the efficiency of this lgorithm, see Hopcroft nd Ullmn for such improvements. Fst lgorithms for testing whether p q for some given p,q Q lso exist. One of these lgorithms is bsed on forwrd closures, following n ide of Knuth. Such n lgorithm is relted to fst unifiction lgorithm; see Section The Pumping Lemm Another useful tool for proving tht lnguges re not regulr is the so-clled pumping lemm. Proposition (Pumping lemm) Given ny DFA D = (Q,Σ,δ,q 0,F), there is some m 1 such tht for every w Σ, if w L(D) nd w m, then there exists decomposition of w s w = uxv, where (1) x ǫ, (2) ux i v L(D), for ll i 0, nd (3) ux m. Moreover, m cn be chosen to be the number of sttes of the DFA D. Proof. Let m be the number of sttes in Q, nd let w = w 1...w n. Since Q contins the strt stte q 0, m 1. Since w m, we hve n m. Since w L(D), let (q 0,q 1,...,q n ), be the sequence of sttes in the ccepting computtion of w (where q n F). Consider the subsequence (q 0,q 1,...,q m ).

112 112 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S This sequence contins m + 1 sttes, but there re only m sttes in Q, nd thus, we hve q i = q j, for some i,j such tht 0 i < j m. Then, letting u = w 1...w i, x = w i+1...w j, nd v = w j+1...w n, it is cler tht the conditions of the proposition hold. An importnt consequence of the pumping lemm is tht if DFA D hs m sttes nd if there is some string w L(D) such tht w m, then L(D) is infinite. Indeed, by the pumping lemm, w L(D) cn be written s w = uxv with x ǫ, nd ux i v L(D) for ll i 0. Since x ǫ, we hve x > 0, so for ll i,j 0 with i < j we hve ux i v < ux i v +(j i) x = ux j v, which implies tht ux i v ux j v for ll i < j, nd the set of strings {ux i v i 0} L(D) is n infinite subset of L(D), which is itself infinite. As consequence, if L(D) is finite, there re no strings w in L(D) such tht w m. In this cse, since the premise of the pumping lemm is flse, the pumping lemm holds vcuously; tht is, if L(D) is finite, the pumping lemm yields no informtion. Another corollry of the pumping lemm is tht there is test to decide whether DFA D ccepts n infinite lnguge L(D). Proposition Let D be DFA with m sttes, The lnguge L(D) ccepted by D is infinite iff there is some string w L(D) such tht m w < 2m. If L(D) is infinite, there re strings of length m in L(D), but prirori there is no gurntee tht there re short strings w in L(D), tht is, strings whose length is uniformly bounded by some function of m independent of D. The pumping lemm ensures tht there re such strings, nd the function is m 2m. Typiclly, the pumping lemm is used to prove tht lnguge is not regulr. The method is to proceed by contrdiction, i.e., to ssume (contrry to wht we wish to prove) tht lnguge L is indeed regulr, nd derive contrdiction of the pumping lemm. Thus, it would be helpful to see wht the negtion of the pumping lemm is, nd for this, we first stte the pumping lemm s logicl formul. We will use the following bbrevitions: nt = {0,1,2,...}, pos = {1,2,...}, A w = uxv, B x ǫ, C ux m, P i: nt(ux i v L(D)).

113 5.12. THE PUMPING LEMMA 113 The pumping lemm cn be stted s ) D: DFA m: pos w: Σ ((w L(D) w m) ( u,x,v: Σ A B C P). Reclling tht (A B C P) (A B C) P (A B C) P nd (R S) R S, the negtion of the pumping lemm cn be stted s ) D: DFA m: pos w: Σ ((w L(D) w m) ( u,x,v: Σ (A B C) P). Since P i: nt(ux i v / L(D)), in order to show tht the pumping lemm is contrdicted, one needs to show tht for some DFA D, for every m 1, there is some string w L(D) of length t lest m, such tht for every possible decomposition w = uxv stisfying the constrints x ǫ nd ux m, there is some i 0 such tht ux i v / L(D). When proceeding by contrdiction, we hve lnguge L tht we re (wrongly) ssuming to be regulr, nd we cn use ny DFA D ccepting L. The cretive prt of the rgument is to pick the right w L (not mking ny ssumption on m w ). As n illustrtion, let us use the pumping lemm to prove tht L 1 = { n b n n 1} is not regulr. The usefulness of the condition ux m lies in the fct tht it reduces the number of legl decomposition uxv of w. We proceed by contrdiction. Thus, let us ssume tht L 1 = { n b n n 1} is regulr. If so, it is ccepted by some DFA D. Now, we wish to contrdict the pumping lemm. For every m 1, let w = m b m. Clerly, w = m b m L 1 nd w m. Then, every legl decomposition u,x,v of w is such tht w =... }{{}... }{{}...b...b }{{} u x v where x ǫ nd x ends within the s, since ux m. Since x ǫ, the string uxxv is of the form n b m where n > m, nd thus uxxv / L 1, contrdicting the pumping lemm. Let us consider two more exmples. let L 2 = { m b n 1 m < n}. We clim tht L 2 is not regulr. Our first proof uses the pumping lemm. For ny m 1, pick w = m b m+1. We hve w L 2 nd w m so we need to contrdict the pumping lemm. Every legl decomposition u,x,v of w is such tht w =... }{{}... }{{}...b...b }{{} u x v

114 114 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S where x ǫ nd x ends within the s, since ux m. Since x ǫ nd x consists of s the string ux 2 v = uxxv contins t lest m+1 s nd still m+1 b s, so ux 2 v L 2, contrdicting the pumping lemm. Our second proof uses Myhill-Nerode. Let be right-invrint equivlence reltion of finite index such tht L 2 is the union of clsses of. If we consider the infinite sequence, 2,..., n,... since hs finite number of clsses there re two strings m nd n with m < n such tht m n. By right-invrince by conctenting on the right with b n we obtin m b n n b n, nd since m < n we hve m b n L 2 but n b n / L 2, contrdiction. Let us now consider the lnguge L 3 = { m b n m n}. This time let us begin by using Myhill-Nerode to prove tht L 3 is not regulr. The proof is the sme s before, we obtin m b n n b n, nd the contrdiction is tht m b n L 3 nd n b n / L 3. Let use now try to use the pumping lemm to prove tht L 3 is not regulr. For ny m 1 pick w = m b m+1 L 3. As in the previous cse, every legl decomposition u,x,v of w is such tht w =... }{{}... }{{}...b...b }{{} u x v where x ǫ ndxends within the s, since ux m. However this timewe hve problem, nmely tht we know tht x is nonempty string of s but we don t know how mny, so we cn t gurntee tht pumping up x will yield exctly the string m+1 b m+1. We mde the wrong choice for w. There is choice tht will work but it is bit tricky. Fortuntely, there is nother simpler pproch. Recll tht the regulr lnguges re closed under the boolen opertions (union, intersection nd complementtion). Thus, L 3 is not regulr iff its complement L 3 is not regulr. Observe tht L 3 contins { n b n n 1}, which we showed to be nonregulr. But there is nother problem, which is tht L 3 contins other strings besides strings of the form n b n, for exmple strings of the form b m n with m,n > 0. Agin, we cn tke cre of this difficulty using the closure opertions of the regulr lnguges. If we cn find regulr lnguge R such tht L 3 R is not regulr, then L 3 itself is not regulr, since otherwise s L 3 nd R re regulr then L 3 R is lso regulr. In our cse, we cn use R = {} + {b} + to obtin L 3 {} + {b} + = { n b n n 1}.

115 5.13. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE 115 Since { n b n n 1} is not regulr, we reched our finl contrdiction. Observe how we use the lnguge R to clen up L 3 by intersecting it with R. To complete direct proof using the pumping lemm, the reder should try w = m! b (m+1)!. The use of the closure opertions of the regulr lnguges is often quick wy of showing tht lnguge L is not regulr by reducing the problem of proving tht L is not regulr to the problem of proving tht some well-known lnguge is not regulr A Fst Algorithm for Checking Stte Equivlence Using Forwrd-Closure Given two sttes p,q Q, if p q, then we know tht δ(p,) δ(q,), for ll Σ. This suggests method for testing whether two distinct sttes p, q re equivlent. Strting with the reltion R = {(p,q)}, construct the smllest equivlence reltion R contining R with the property tht whenever (r,s) R, then (δ(r,),δ(s,)) R, for ll Σ. If we ever encounter pir (r,s) such tht r F nd s F, or r F nd s F, then r nd s re inequivlent, nd so re p nd q. Otherwise, it cn be shown tht p nd q re indeed equivlent. Thus, testing for the equivlence of two sttes reduces to finding n efficient method for computing the forwrd closure of reltion defined on the set of sttes of DFA. Such method ws worked out by John Hopcroft nd Richrd Krp nd published in 1971 Cornell technicl report. This method is bsed on n ide of Donld Knuth for solving Exercise 11, in Section of The Art of Computer Progrmming, Vol. 1, second edition, A sketch of the solution for this exercise is given on pge 594. As fr s I know, Hopcroft nd Krp s method ws never published in journl, but simple recursive lgorithm does pper on pge 144 of Aho, Hopcroft nd Ullmn s The Design nd Anlysis of Computer Algorithms, first edition, Essentilly the sme ide ws used by Pterson nd Wegmn to design fst unifiction lgorithm (in 1978). We mke few definitions. A reltion S Q Q is forwrd closure iff it is n equivlence reltion nd whenever (r, s) S, then (δ(r, ), δ(s, )) S, for ll Σ. The forwrd closure of reltion R Q Q is the smllest equivlence reltion R contining R which is forwrd closed. We sy tht forwrd closure S is good iff whenever (r,s) S, then good(r,s), where good(r,s) holds iff either both r,s F, or both r,s / F. Obviously, bd(r,s) iff good(r,s). Given ny reltion R Q Q, recll tht thesmllest equivlence reltion R contining Risthereltion(R R 1 ) (where R 1 = {(q,p) (p,q) R}, nd(r R 1 ) isthereflexive nd trnsitive closure of (R R 1 )). The forwrd closure of R cn be computed inductively by defining the sequence of reltions R i Q Q s follows:

116 116 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S R 0 = R R i+1 = (R i {(δ(r,),δ(s,)) (r,s) R i, Σ}). It is not hrd to prove tht R i0 +1 = R i0 for some lest i 0, nd tht R = R i0 is the smllest forwrd closure contining R. The following two fcts cn lso been estblished. () if R is good, then (b) if p q, then R. (5.1) R, tht is, eqution (5.1) holds. This implies tht R is good. As consequence, we obtin the correctness of our procedure: p q iff the forwrd closure R of the reltion R = {(p,q)} is good. In prctice, we mintin prtition Π representing the equivlence reltion tht we re closing under forwrd closure. We dd ech new pir (δ(r,),δ(s,)) one t time, nd immeditely form the smllest equivlence reltion contining the new reltion. If δ(r, ) nd δ(s,) lredy belong to the sme block of Π, we consider nother pir, else we merge the blocks corresponding to δ(r, ) nd δ(s, ), nd then consider nother pir. The lgorithm is recursive, but it cn esily be implemented using stck. To mnipulte prtitions efficiently, we represent them s lists of trees (forests). Ech equivlence clss C in the prtition Π is represented by tree structure consisting of nodes nd prent pointers, with the pointers from the sons of node to the node itself. The root hs null pointer. Ech node lso mintins counter keeping trck of the number of nodes in the subtree rooted t tht node. Note tht pointers cn be voided. We cn represent forest of n nodes s list of n pirs of the form (fther,count). If (fther,count) is the ith pir in the list, then fther = 0 iff node i is root node, otherwise, fther is the index of the node in the list which is the prent of node i. The number count is the totl number of nodes in the tree rooted t the ith node. For exmple, the following list of nine nodes ((0,3),(0,2),(1,1),(0,2),(0,2),(1,1),(2,1),(4,1),(5,1)) represents forest consisting of the following four trees:

117 5.13. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE Figure 5.20: A forest of four trees Two functions union nd find re defined s follows. Given stte p, find(p,π) finds the root of the tree contining p s node (not necessrily lef). Given two root nodes p,q, union(p,q,π) forms new prtition by merging the two trees with roots p nd q s follows: if the counter of p is smller thn tht of q, then let the root of p point to q, else let the root of q point to p. For exmple, given the two trees shown on the left in Figure 5.21, find(6,π) returns 3 nd find(8,π) returns 4. Then union(3,4,π) yields the tree shown on the right in Figure Figure 5.21: Applying the function union to the trees rooted t 3 nd 4 In order to speed up the lgorithm, using n ide due to Trjn, we cn modify find s follows: during cll find(p,π), s we follow the pth from p to the root r of the tree contining p, we redirect the prent pointer of every node q on the pth from p (including p itself) to r (we perform pth compression). For exmple, pplying find(8,π) to the tree shown on the right in Figure 5.21 yields the tree shown in Figure Figure 5.22: The result of pplying find with pth compression Then, the lgorithm is s follows:

118 118 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S function unif[p,q,π,dd]: flg; begin end trns := left(dd); ff := right(dd); pq := (p,q); st := (pq); flg := 1; k := Length(first(trns)); while st () flg 0 do uv := top(st); uu := left(uv); vv := right(uv); pop(st); if bd(ff,uv) = 1 then flg := 0 else u := find(uu,π); v := find(vv,π); if u v then union(u, v, Π); for i = 1 to k do u1 := delt(trns,uu,k i+1); v1 := delt(trns,vv,k i+1); uv := (u1,v1); push(st,uv) endfor endif endif endwhile The initil prtition Π is the identity reltion on Q, i.e., it consists of blocks {q} for ll sttes q Q. The lgorithm uses stck st. We re ssuming tht the DFA dd is specified by list of two sublists, the first list, denoted left(dd) in the pseudo-code bove, being representtion of the trnsition function, nd the second one, denoted right(dd), the set of finl sttes. The trnsition function itself is list of lists, where the i-th list represents the i-th row of the trnsition tble for dd. The function delt is such tht delt(trns,i,j) returns the j-th stte in the i-th row of the trnsition tble of dd. For exmple, we hve the DFA dd = (((2,3),(2,4),(2,3),(2,5),(2,3),(7,6),(7,8),(7,9),(7,6)),(5,9)) consisting of 9 sttes lbeled 1,...,9, nd two finl sttes 5 nd 9 shown in Figure Also, the lphbet hs two letters, since every row in the trnsition tble consists of two entries. For exmple, the two trnsitions from stte 3 re given by the pir (2,3), which indictes tht δ(3,) = 2 nd δ(3,b) = 3. The sequence of steps performed by the lgorithm strting with p = 1 nd q = 6 is shown below. At every step, we show the current pir of sttes, the prtition, nd the stck.

119 5.13. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE b b b b b b b b b Figure 5.23: Testing stte equivlence in DFA p = 1,q = 6, Π = {{1,6},{2},{3},{4},{5},{7},{8},{9}}, st = {{1,6}} b b b b b b b b b Figure 5.24: Testing stte equivlence in DFA p = 2,q = 7, Π = {{1,6},{2,7},{3},{4},{5},{8},{9}}, st = {{3,6},{2,7}}

120 120 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S b b b b b b b b b Figure 5.25: Testing stte equivlence in DFA p = 4,q = 8, Π = {{1,6},{2,7},{3},{4,8},{5},{9}}, st = {{3,6},{4,8}} b b b b b b b b b Figure 5.26: Testing stte equivlence in DFA p = 5,q = 9, Π = {{1,6},{2,7},{3},{4,8},{5,9}}, st = {{3,6},{5,9}}

121 5.13. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE b 1 b 4 5 b 3 b b b 6 b b b Figure 5.27: Testing stte equivlence in DFA p = 3,q = 6, Π = {{1,3,6},{2,7},{4,8},{5,9}}, st = {{3,6},{3,6}} Since sttes 3 nd 6 belong to the first block of the prtition, the lgorithm termintes. Since no block of the prtition contins bd pir, the sttes p = 1 nd q = 6 re equivlent. Let us now test whether the sttes p = 3 nd q = 7 re equivlent. 2 b 1 b 4 5 b 3 b b 6 b b b Figure 5.28: Testing stte equivlence in DFA

122 122 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S p = 3,q = 7, Π = {{1},{2},{3,7},{4},{5},{6},{8},{9}}, st = {{3,7}} b b b b b b b b b Figure 5.29: Testing stte equivlence in DFA p = 2,q = 7, Π = {{1},{2,3,7},{4},{5},{6},{8},{9}}, st = {{3,8},{2,7}} b b b b b b b b b Figure 5.30: Testing stte equivlence in DFA

123 5.13. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE 123 p = 4,q = 8, Π = {{1},{2,3,7},{4,8},{5},{6},{9}}, st = {{3,8},{4,8}} b b b b b b b b b Figure 5.31: Testing stte equivlence in DFA p = 5,q = 9, Π = {{1},{2,3,7},{4,8},{5,9},{6}}, st = {{3,8},{5,9}} b b b b b b b b b Figure 5.32: Testing stte equivlence in DFA

124 124 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S p = 3,q = 6, Π = {{1},{2,3,6,7},{4,8},{5,9}}, st = {{3,8},{3,6}} b b b b b b b b b Figure 5.33: Testing stte equivlence in DFA p = 3,q = 8, Π = {{1},{2,3,4,6,7,8},{5,9}}, st = {{3,8}} b b b b b b b b b Figure 5.34: Testing stte equivlence in DFA

125 5.13. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE 125 p = 3,q = 9, Π = {{1},{2,3,4,6,7,8},{5,9}}, st = {{3,9}} Since the pir (3,9) is bd pir, the lgorithm stops, nd the sttes p = 3 nd q = 7 re inequivlent.

126 126 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S

127 Chpter 6 Context-Free Grmmrs, Context-Free Lnguges, Prse Trees nd Ogden s Lemm 6.1 Context-Free Grmmrs A context-free grmmr bsiclly consists of finite set of grmmr rules. In order to define grmmr rules, we ssume tht we hve two kinds of symbols: the terminls, which re the symbols of the lphbet underlying the lnguges under considertion, nd the nonterminls, which behve like vribles rnging over strings of terminls. A rule is of the form A α, where A is single nonterminl, nd the right-hnd side α is string of terminl nd/or nonterminl symbols. As usul, first we need to define wht the object is ( context-free grmmr), nd then we need to explin how it is used. Unlike utomt, grmmrs re used to generte strings, rther thn recognize strings. Definition 6.1. A context-free grmmr (for short, CFG) is qudruple G = (V,Σ,P,S), where V is finite set of symbols clled the vocbulry (or set of grmmr symbols); Σ V is the set of terminl symbols (for short, terminls); S (V Σ) is designted symbol clled the strt symbol; P (V Σ) V is finite set of productions (or rewrite rules, or rules). The set N = V Σ is clled the set of nonterminl symbols (for short, nonterminls). Thus, P N V, nd every production A,α is lso denoted s A α. A production of the form A ǫ is clled n epsilon rule, or null rule. 127

128 128 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Remrk: Context-free grmmrs re sometimes defined s G = (V N,V T,P,S). The correspondence with our definition is tht Σ = V T nd N = V N, so tht V = V N V T. Thus, in this other definition, it is necessry to ssume tht V T V N =. Exmple 1. G 1 = ({E,,b},{,b},P,E), where P is the set of rules E Eb, E b. As we will see shortly, this grmmr genertes the lnguge L 1 = { n b n n 1}, which is not regulr. Exmple 2. G 2 = ({E,+,,(,),},{+,,(,),},P,E), where P is the set of rules E E +E, E E E, E (E), E. This grmmr genertes set of rithmetic expressions. 6.2 Derivtions nd Context-Free Lnguges The productions of grmmr re used to derive strings. In this process, the productions re used s rewrite rules. Formlly, we define the derivtion reltion ssocited with context-free grmmr. First, let us review the concepts of trnsitive closure nd reflexive nd trnsitive closure of binry reltion. Given set A, binry reltion R on A is ny set of ordered pirs, i.e. R A A. For short, insted of binry reltion, we often simply sy reltion. Given ny two reltions R, S on A, their composition R S is defined s R S = {(x, y) A A z A, (x, z) R nd (z, y) S}. The identity reltion I A on A is the reltion I A defined such tht For short, we often denote I A s I. Note tht I A = {(x, x) x A}. R I = I R = R for every reltion R on A. Given reltion R on A, for ny n 0 we define R n s follows: R 0 = I, R n+1 = R n R.

129 6.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 129 It is obvious tht R 1 = R. It is lso esily verified by induction tht R n R = R R n. The trnsitive closure R + of the reltion R is defined s R + = n 1R n. It is esily verified tht R + is the smllest trnsitive reltion contining R, nd tht (x,y) R + iff there is some n 1 nd some x 0,x 1,...,x n A such tht x 0 = x, x n = y, nd (x i, x i+1 ) R for ll i, 0 i n 1. The trnsitive nd reflexive closure R of the reltion R is defined s R = n 0R n. Clerly, R = R + I. It is esily verified tht R is the smllest trnsitive nd reflexive reltion contining R. Definition 6.2. Given context-free grmmr G = (V, Σ, P, S), the (one-step) derivtion reltion = G ssocited with G is the binry reltion = G V V defined s follows: for ll α,β V, we hve α = G β iff there exist λ,ρ V, nd some production (A γ) P, such tht α = λaρ nd β = λγρ. + The trnsitive closure of = G is denoted s = G nd the reflexive nd trnsitive closure of = G is denoted s = G. When the grmmr G is cler from the context, we usully omit the subscript G in = G, + = G, nd = G. A string α V such tht S = α is clled sententil form, nd string w Σ such tht S = w is clled sentence. A derivtion α = β involving n steps is denoted s α = n β. Note tht derivtion step α = G β is rther nondeterministic. Indeed, one cn choose mong vrious occurrences of nonterminls A in α, nd lso mong vrious productions A γ with left-hnd side A. For exmple, using the grmmr G 1 = ({E,,b},{,b},P,E), where P is the set of rules E Eb, E b,

130 130 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES every derivtion from E is of the form E = n Eb n = n bb n = n+1 b n+1, or E = n Eb n = n Ebb n = n+1 Eb n+1, where n 0. Grmmr G 1 is very simple: every string n b n hs unique derivtion. This is usully not the cse. For exmple, using the grmmr G 2 = ({E,+,,(,),},{+,,(,),},P,E), where P is the set of rules E E +E, E E E, E (E), E, the string + hs the following distinct derivtions, where the boldfce indictes which occurrence of E is rewritten: nd E = E E = E+E E = +E E = + E = +, E = E+E = +E = +E E = + E = +. In the bove derivtions, the leftmost occurrence of nonterminl is chosen t ech step. Such derivtions re clled leftmost derivtions. We could systemticlly rewrite the rightmost occurrence of nonterminl, getting rightmost derivtions. The string + lso hs the following two rightmost derivtions, where the boldfce indictes which occurrence of E is rewritten: nd E = E +E = E +E E = E +E = E+ = +, E = E E = E = E +E = E+ = +. The lnguge generted by context-free grmmr is defined s follows.

131 6.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 131 Definition 6.3. Given context-free grmmr G = (V,Σ,P,S), the lnguge generted by G is the set L(G) = {w Σ S = + w}. A lnguge L Σ is context-free lnguge (for short, CFL) iff L = L(G) for some context-free grmmr G. It is techniclly very useful to consider derivtions in which the leftmost nonterminl is lwys selected for rewriting, nd dully, derivtions in which the rightmost nonterminl is lwys selected for rewriting. Definition 6.4. Given context-free grmmr G = (V, Σ, P, S), the (one-step) leftmost derivtion reltion = ssocited with G is the binry reltion = V V defined s lm lm follows: for ll α,β V, we hve α = β lm iff there exist u Σ, ρ V, nd some production (A γ) P, such tht α = uaρ nd β = uγρ. + The trnsitive closure of = is denoted s = nd the reflexive nd trnsitive closure of lm lm = is denoted s =. The (one-step) rightmost derivtion reltion = ssocited with lm lm rm G is the binry reltion = V V defined s follows: for ll α,β V, we hve rm α = rm β iff there exist λ V, v Σ, nd some production (A γ) P, such tht α = λav nd β = λγv. The trnsitive closure of = is denoted s rm = is denoted s =. rm rm + = rm nd the reflexive nd trnsitive closure of Remrks: It is customry to use the symbols,b,c,d,e for terminl symbols, nd the symbols A,B,C,D,E for nonterminl symbols. The symbols u,v,w,x,y,z denote terminl strings, nd the symbols α,β,γ,λ,ρ,µ denote strings in V. The symbols X,Y,Z usully denote symbols in V. Given context-free grmmr G = (V,Σ,P,S), prsing string w consists in finding out whether w L(G), nd if so, in producing derivtion for w. The following proposition is techniclly very importnt. It shows tht leftmost nd rightmost derivtions re universl. This hs some importnt prcticl implictions for the complexity of prsing lgorithms.

132 132 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Proposition 6.1. Let G = (V,Σ,P,S) be context-free grmmr. For every w Σ, for every derivtion S = + w, there is leftmost derivtion S = + w, nd there is rightmost lm derivtion S = + w. rm Proof. Of course, we hve to somehow use induction on derivtions, but this is little tricky, nd it is necessry to prove stronger fct. We tret leftmost derivtions, rightmost derivtions being hndled in similr wy. Clim: For every w Σ, for every α V +, for every n 1, if α = n w, then there is leftmost derivtion α = n lm w. The clim is proved by induction on n. For n = 1, there exist some λ,ρ V nd some production A γ, such tht α = λaρ nd w = λγρ. Since w is terminl string, λ,ρ, nd γ, re terminl strings. Thus, A is the only nonterminl in α, nd the derivtion step α = 1 w is leftmost step (nd rightmost step!). If n > 1, then the derivtion α n = w is of the form There re two subcses. α = α 1 n 1 = w. Cse 1. If the derivtion step α = α 1 is leftmost step α = α 1, by the induction lm n 1 hypothesis, there is leftmost derivtion α 1 = w, nd we get the leftmost derivtion lm α = n 1 α 1 = w. lm lm Cse 2. The derivtion step α = α 1 is not leftmost step. In this cse, there must be some u Σ, µ,ρ V, some nonterminls A nd B, nd some production B δ, such tht α = uaµbρ nd α 1 = uaµδρ, where A is the leftmost nonterminl in α. Since we hve derivtion α 1 n 1 = w of length n 1, by the induction hypothesis, there is leftmost derivtion α 1 n 1 = lm w. Since α 1 = uaµδρ where A is the leftmost terminl in α 1, the first step in the leftmost n 1 derivtion α 1 = w is of the form lm uaµδρ = lm uγµδρ,

133 6.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 133 for some production A γ. Thus, we hve derivtion of the form α = uaµbρ = uaµδρ = uγµδρ n 2 = w. lm lm We cn commute the first two steps involving the productions B δ nd A γ, nd we get the derivtion α = uaµbρ = uγµbρ = uγµδρ n 2 = w. lm lm This my no longer be leftmost derivtion, but the first step is leftmost, nd we re bck in cse 1. Thus, we conclude by pplying the induction hypothesis to the derivtion uγµbρ = n 1 w, s in cse 1. Proposition 6.1 implies tht L(G) = {w Σ S = + w} = {w Σ S = + w}. lm rm We observed tht if we consider the grmmr G 2 = ({E,+,,(,),},{+,,(,),},P,E), where P is the set of rules E E +E, E E E, E (E), E, the string + hs the following two distinct leftmost derivtions, where the boldfce indictes which occurrence of E is rewritten: nd E = E E = E+E E = +E E = + E = +, E = E+E = +E = +E E = + E = +. When this hppens, we sy tht we hve n mbiguous grmmrs. In some cses, it is possible to modify grmmr to mke it unmbiguous. For exmple, the grmmr G 2 cn be modified s follows. Let G 3 = ({E,T,F,+,,(,),},{+,,(,),},P,E), where P is the set of rules E E +T, E T, T T F, T F, F (E), F.

134 134 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES We leve s n exercise to show tht L(G 3 ) = L(G 2 ), nd tht every string in L(G 3 ) hs unique leftmost derivtion. Unfortuntely, it is not lwys possible to modify contextfree grmmr to mke it unmbiguous. There exist context-free lnguges tht hve no unmbiguous context-free grmmrs. For exmple, the lnguge L 3 = { m b m c n m,n 1} { m b n c n m,n 1} is context-free, since it is generted by the following context-free grmmr: S S 1, S S 2, S 1 XC, S 2 AY, X Xb, X b, Y byc, Y bc, A A, A, C cc, C c. However, it cn be shown tht L 3 hs no unmbiguous grmmrs. All this motivtes the following definition. Definition 6.5. Acontext-freegrmmrG = (V,Σ,P,S)ismbiguous ifthereissomestring w L(G) tht hs two distinct leftmost derivtions (or two distinct rightmost derivtions). Thus, grmmrgisunmbiguous ifevery stringw L(G) hsunique leftmost derivtion (or unique rightmost derivtion). A context-free lnguge L is inherently mbiguous if every CFG G for L is mbiguous. Whether or not grmmr is mbiguous ffects the complexity of prsing. Prsing lgorithms for unmbiguous grmmrs re more efficient thn prsing lgorithms for mbiguous grmmrs. We now consider vrious norml forms for context-free grmmrs. 6.3 Norml Forms for Context-Free Grmmrs, Chomsky Norml Form One of the min gols of this section is to show tht every CFG G cn be converted to n equivlent grmmr in Chomsky Norml Form (for short, CNF). A context-free grmmr

135 6.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 135 G = (V,Σ,P,S) is in Chomsky Norml Form iff its productions re of the form A BC, A, or S ǫ, where A,B,C N, Σ, S ǫ is in P iff ǫ L(G), nd S does not occur on the right-hnd side of ny production. Note tht grmmr in Chomsky Norml Form does not hve ǫ-rules, i.e., rules of the form A ǫ, except when ǫ L(G), in which cse S ǫ is the only ǫ-rule. It lso does not hve chin rules, i.e., rules of the form A B, where A,B N. Thus, in order to convert grmmr to Chomsky Norml Form, we need to show how to eliminte ǫ-rules nd chin rules. This is not the end of the story, since we my still hve rules of the form A α where either α 3 or α 2 nd α contins terminls. However, deling with such rules is simple recoding mtter, nd we first focus on the elimintion of ǫ-rules nd chin rules. It turns out tht ǫ-rules must be eliminted first. The first step to eliminte ǫ-rules is to compute the set E(G) of ersble (or nullble) nonterminls E(G) = {A N A = + ǫ}. The set E(G) is computed using sequence of pproximtions E i defined s follows: E 0 = {A N (A ǫ) P}, E i+1 = E i {A (A B 1...B j...b k ) P, B j E i, 1 j k}. Clerly, the E i form n scending chin E 0 E 1 E i E i+1 N, nd since N is finite, there is lest i, sy i 0, such tht E i0 = E i0 +1. We clim tht E(G) = E i0. Actully, we prove the following proposition. Proposition 6.2. Given context-free grmmr G = (V,Σ,P,S), one cn construct context-free grmmr G = (V,Σ,P,S ) such tht: (1) L(G ) = L(G); (2) P contins no ǫ-rules other thn S ǫ, nd S ǫ P iff ǫ L(G); (3) S does not occur on the right-hnd side of ny production in P. Proof. We begin by proving tht E(G) = E i0. For this, we prove tht E(G) E i0 nd E i0 E(G). To prove tht E i0 E(G), we proceed by induction on i. Since E 0 = {A N (A ǫ) P}, we hve A 1 = ǫ, nd thus A E(G). By the induction hypothesis, E i

136 136 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES E(G). If A E i+1, either A E i nd then A E(G), or there is some production (A B 1...B j...b k ) P, such tht B j E i for ll j, 1 j k. By the induction hypothesis, B + j = ǫ for ech j, 1 j k, nd thus which shows tht A E(G). A = B 1...B j...b k + = B 2...B j...b k + = B j...b k + = ǫ, To prove thte(g) E i0, we lsoproceed byinduction, but onthelengthofderivtion A = + ǫ. If A = 1 ǫ, then A ǫ P, nd thus A E 0 since E 0 = {A N (A ǫ) P}. If A n+1 = ǫ, then A = α n = ǫ, for some production A α P. If α contins terminls of nonterminls not in E(G), it is impossible to derive ǫ from α, nd thus, we must hve α = B 1...B j...b k, with B j E(G), n j for ll j, 1 j k. However, B j = ǫ where nj n, nd by the induction hypothesis, B j E i0. But then, we get A E i0 +1 = E i0, s desired. Hving shown tht E(G) = E i0, we construct the grmmr G. Its set of production P is defined s follows. First, we crete the production S S where S / V, to mke sure tht S does not occur on the right-hnd side of ny rule in P. Let nd let P 2 be the set of productions P 1 = {A α P α V + } {S S}, P 2 = {A α 1 α 2...α k α k+1 α 1 V,..., α k+1 V, B 1 E(G),..., B k E(G) A α 1 B 1 α 2...α k B k α k+1 P, k 1, α 1...α k+1 ǫ}. Note tht ǫ L(G) iff S E(G). If S / E(G), then let P = P 1 P 2, nd if S E(G), then let P = P 1 P 2 {S ǫ}. We clim tht L(G ) = L(G), which is proved by showing tht every derivtion using G cn be simulted by derivtion using G, nd vice-vers. All the conditions of the proposition re now met. From prcticl point of view, the construction or Proposition 6.2 is very costly. For exmple, given grmmr contining the productions S ABCDEF, A ǫ, B ǫ, C ǫ, D ǫ, E ǫ, F ǫ,......,

137 6.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 137 eliminting ǫ-rules will crete = 63 new rules corresponding to the 63 nonempty subsets of the set {A,B,C,D,E,F}. We now turn to the elimintion of chin rules. It turns out tht mtters re gretly simplified if we first pply Proposition 6.2 to the input grmmr G, nd we explin the construction ssuming tht G = (V,Σ,P,S) stisfies the conditions of Proposition 6.2. For every nonterminl A N, we define the set I A = {B N A + = B}. The sets I A re computed using pproximtions I A,i defined s follows: I A,0 = {B N (A B) P}, I A,i+1 = I A,i {C N (B C) P, ndb I A,i }. Clerly, for every A N, the I A,i form n scending chin I A,0 I A,1 I A,i I A,i+1 N, nd since N is finite, there is lest i, sy i 0, such tht I A,i0 = I A,i0 +1. We clim tht I A = I A,i0. Actully, we prove the following proposition. Proposition 6.3. Given context-free grmmr G = (V,Σ,P,S), one cn construct context-free grmmr G = (V,Σ,P,S ) such tht: (1) L(G ) = L(G); (2) Every rule in P is of the form A α where α 2, or A where Σ, or S ǫ iff ǫ L(G); (3) S does not occur on the right-hnd side of ny production in P. Proof. First, we pply Proposition 6.2 to the grmmr G, obtining grmmr G 1 = (V 1,Σ,S 1, P 1 ). The proof tht I A = I A,i0 is similr to the proof tht E(G) = E i0. First, we prove tht I A,i I A by induction on i. This is stightforwrd. Next, we prove tht I A I A,i0 by induction on derivtions of the form A = + B. In this prt of the proof, we use the fct tht G 1 hs no ǫ-rules except perhps S 1 ǫ, nd tht S 1 does not occur on the right-hnd side of ny rule. This implies tht derivtion A = n+1 C is necessrily of the form A = n B = C for some B N. Then, in the induction step, we hve B I A,i0, nd thus C I A,i0 +1 = I A,i0. We now define the following sets of rules. Let nd let P 2 = P 1 {A B A B P 1 }, P 3 = {A α B α P 1, α / N 1, B I A }. We clim tht G = (V 1,Σ,P 2 P 3,S 1 ) stisfies the conditions of the proposition. For exmple, S 1 does not pper on the right-hnd side of ny production, since the productions in P 3 hve right-hnd sides from P 1, nd S 1 does not pper on the right-hnd side in P 1. It is lso esily shown tht L(G ) = L(G 1 ) = L(G).

138 138 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Let us pply the method of Proposition 6.3 to the grmmr G 3 = ({E,T,F,+,,(,),},{+,,(,),},P,E), where P is the set of rules E E +T, E T, T T F, T F, F (E), F. We get I E = {T,F}, I T = {F}, nd I F =. The new grmmr G 3 hs the set of rules E E +T, E T F, E (E), E, T T F, T (E), T, F (E), F. At this stge, the grmmr obtined in Proposition 6.3 no longer hs ǫ-rules (except perhps S ǫ iff ǫ L(G)) or chin rules. However, it my contin rules A α with α 3, or with α 2 nd where α contins terminls(s). To obtin the Chomsky Norml Form. we need to eliminte such rules. This is not difficult, but nottionlly bit messy. Proposition 6.4. Given context-free grmmr G = (V,Σ,P,S), one cn construct context-free grmmr G = (V,Σ,P,S ) such tht L(G ) = L(G) nd G is in Chomsky Norml Form, tht is, grmmr whose productions re of the form A BC, A, or S ǫ, where A,B,C N, Σ, S ǫ is in P iff ǫ L(G), nd S does not occur on the right-hnd side of ny production in P.

139 6.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 139 Proof. First, wepplyproposition6.3, obtiningg 1. LetΣ r bethesetofterminlsoccurring on the right-hnd side of rules A α P 1, with α 2. For every Σ r, let X be new nonterminl not in V 1. Let Let P 1,r be the set of productions P 2 = {X Σ r }. A α 1 1 α 2 α k k α k+1, where 1,..., k Σ r nd α i N1. For every production A α 1 1 α 2 α k k α k+1 in P 1,r, let A α 1 X 1 α 2 α k X k α k+1 be new production, nd let P 3 be the set of ll such productions. Let P 4 = (P 1 P 1,r ) P 2 P 3. Now, productions A α in P 4 with α 2 do not contin terminls. However, we my still hve productions A α P 4 with α 3. We cn perform some recoding using some new nonterminls. For every production of the form A B 1 B k, where k 3, crete the new nonterminls [B 1 B k 1 ],[B 1 B k 2 ],,[B 1 B 2 B 3 ],[B 1 B 2 ], nd the new productions A [B 1 B k 1 ]B k, [B 1 B k 1 ] [B 1 B k 2 ]B k 1,, [B 1 B 2 B 3 ] [B 1 B 2 ]B 3, [B 1 B 2 ] B 1 B 2. All the productions renowin Chomsky Norml Form, ndit iscler tht thesme lnguge is generted. Applying the first phse of the method of Proposition 6.4 to the grmmr G 3, we get the

140 140 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES rules E EX + T, E TX F, E X ( EX ), E, T TX F, T X ( EX ), T, F X ( EX ), F, X + +, X, X ( (, X ) ). After pplying the second phse of the method, we get the following grmmr in Chomsky Norml Form: E [EX + ]T, [EX + ] EX +, E [TX ]F, [TX ] TX, E [X ( E]X ), [X ( E] X ( E, E, T [TX ]F, T [X ( E]X ), T, F [X ( E]X ), F, X + +, X, X ( (, X ) ). For lrge grmmrs, it is often convenient to use the bbrevition which consists in grouping productions hving common left-hnd side, nd listing the right-hnd sides seprted

141 6.4. REGULAR LANGUAGES ARE CONTEXT-FREE 141 by the symbol. Thus, group of productions A α 1, A α 2,, A α k, my be bbrevited s A α 1 α 2 α k. An interesting corollry of the CNF is the following decidbility result. There is n lgorithm which, given context-free grmmr G, given ny string w Σ, decides whether w L(G). Indeed, we first convert G to grmmr G in Chomsky Norml Form. If w = ǫ, we cn test whether ǫ L(G), since this is the cse iff S ǫ P. If w ǫ, letting n = w, note tht since the rules re of the form A BC or A, where Σ, ny derivtion for w hs n 1+n = 2n 1 steps. Thus, we enumerte ll (leftmost) derivtions of length 2n 1. There re much better prsing lgorithms thn this nive lgorithm. We now show tht every regulr lnguge is context-free. 6.4 Regulr Lnguges re Context-Free The regulr lnguges cn be chrcterized in terms of very specil kinds of context-free grmmrs, right-liner (nd left-liner) context-free grmmrs. Definition 6.6. A context-free grmmr G = (V, Σ, P, S) is left-liner iff its productions re of the form A B, A, A ǫ. where A,B N, nd Σ. A context-free grmmr G = (V,Σ,P,S) is right-liner iff its productions re of the form where A,B N, nd Σ. A B, A, A ǫ. The following proposition shows the equivlence between NFA s nd right-liner grmmrs.

142 142 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Proposition 6.5. A lnguge L is regulr if nd only if it is generted by some right-liner grmmr. Proof. Let L = L(D) for some DFA D = (Q,Σ,δ,q 0,F). We construct right-liner grmmr G s follows. Let V = Q Σ, S = q 0, nd let P be defined s follows: P = {p q q = δ(p,), p,q Q, Σ} {p ǫ p F}. It is esily shown by induction on the length of w tht nd thus, L(D) = L(G). p = wq iff q = δ (p,w), Conversely, letg = (V,Σ,P,S)beright-linergrmmr. First, letg = (V,Σ,P,S)be the right-liner grmmr obtined from G by dding the new nonterminl E to N, replcing every rule in P of the form A where Σ by the rule A E, nd dding the rule E ǫ. It is immeditely verified tht L(G ) = L(G). Next, we construct the NFA M = (Q,Σ,δ,q 0,F) s follows: Q = N = N {E}, q 0 = S, F = {A N A ǫ}, nd δ(a,) = {B N A B P }, for ll A N nd ll Σ. It is esily shown by induction on the length of w tht nd thus, L(M) = L(G ) = L(G). A = wb iff B δ (A,w), A similr proposition holds for left-liner grmmrs. It is lso esily shown tht the regulr lnguges re exctly the lnguges generted by context-free grmmrs whose rules re of the form where A,B N, nd u Σ. A Bu, A u, 6.5 Useless Productions in Context-Free Grmmrs Given context-free grmmr G = (V,Σ,P,S), it my contin rules tht re useless for number of resons. For exmple, consider the grmmr G 3 = ({E,A,,b},{,b},P,E), where P is the set of rules E Eb, E b, E A, A ba.

143 6.5. USELESS PRODUCTIONS IN CONTEXT-FREE GRAMMARS 143 The problem is tht the nonterminl A does not derive ny terminl strings, nd thus, it is useless, s well s the lst two productions. Let us now consider the grmmr G 4 = ({E,A,,b,c,d},{,b,c,d},P,E), where P is the set of rules E Eb, E b, A cad, A cd. This time, the nonterminl A genertes strings of the form c n d n, but there is no derivtion E = + α from E where A occurs in α. The nonterminl A is not connected to E, nd the lst two rules re useless. Fortuntely, it is possible to find such useless rules, nd to eliminte them. Let T(G) be the set of nonterminls tht ctully derive some terminl string, i.e. T(G) = {A (V Σ) w Σ, A = + w}. The set T(G) cn be defined by stges. We define the sets T n (n 1) s follows: nd T 1 = {A (V Σ) (A w) P, with w Σ }, T n+1 = T n {A (V Σ) (A β) P, with β (T n Σ) }. It is esy to prove tht there is some lest n such tht T n+1 = T n, nd tht for this n, T(G) = T n. If S / T(G), then L(G) =, nd G is equivlent to the trivil grmmr G = ({S},Σ,,S). If S T(G), then let U(G) be the set of nonterminls tht re ctully useful, i.e., U(G) = {A T(G) α,β (T(G) Σ), S = αaβ}. The set U(G) cn lso be computed by stges. We define the sets U n (n 1) s follows: nd U 1 = {A T(G) (S αaβ) P, with α,β (T(G) Σ) }, U n+1 = U n {B T(G) (A αbβ) P, with A U n, α,β (T(G) Σ) }. It is esy to prove tht there is some lest n such tht U n+1 = U n, nd tht for this n, U(G) = U n {S}. Then, we cn use U(G) to trnsform G into n equivlent CFG in

144 144 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES which every nonterminl is useful (i.e., for which V Σ = U(G)). Indeed, simply delete ll rules contining symbols not in U(G). The detils re left s n exercise. We sy tht context-free grmmr G is reduced if ll its nonterminls re useful, i.e., N = U(G). It should be noted thn lthough dull, the bove considertions re importnt in prctice. Certin lgorithms for constructing prsers, for exmple, LR-prsers, my loop if useless rules re not eliminted! We now consider nother norml form for context-free grmmrs, the Greibch Norml Form. 6.6 The Greibch Norml Form Every CFG G cn lso be converted to n equivlent grmmr in Greibch Norml Form (for short, GNF). A context-free grmmr G = (V,Σ,P,S) is in Greibch Norml Form iff its productions re of the form A BC, A B, A, or S ǫ, where A,B,C N, Σ, S ǫ is in P iff ǫ L(G), nd S does not occur on the right-hnd side of ny production. Note tht grmmr in Greibch Norml Form does not hve ǫ-rules other thn possibly S ǫ. More importntly, except for the specil rule S ǫ, every rule produces some terminl symbol. An importnt consequence of the Greibch Norml Form is tht every nonterminl is not left recursive. A nonterminl A is left recursive iff A = + Aα for some α V. Left recursive nonterminls cuse top-down determinitic prsers to loop. The Greibch Norml Form provides wy of voiding this problem. There re no esy proofs tht every CFG cn be converted to Greibch Norml Form. A prticulrly elegnt method due to Rosenkrntz using lest fixed-points nd mtrices will be given in section 6.9. Proposition 6.6. Given context-free grmmr G = (V,Σ,P,S), one cn construct context-free grmmr G = (V,Σ,P,S ) such tht L(G ) = L(G) nd G is in Greibch Norml Form, tht is, grmmr whose productions re of the form A BC, A B, A, or S ǫ,

145 6.7. LEAST FIXED-POINTS 145 where A,B,C N, Σ, S ǫ is in P iff ǫ L(G), nd S does not occur on the right-hnd side of ny production in P. 6.7 Lest Fixed-Points Context-free lnguges cn lso be chrcterized s lest fixed-points of certin functions induced by grmmrs. This chrcteriztion yields rther quick proof tht every contextfree grmmr cn be converted to Greibch Norml Form. This chrcteriztion lso revels very clerly the recursive nture of the context-free lnguges. We begin by reviewing wht we need from the theory of prtilly ordered sets. Definition 6.7. Given prtilly ordered set A,, n ω-chin ( n ) n 0 is sequence such tht n n+1 for ll n 0. The lest-upper bound of n ω-chin ( n ) is n element A such tht: (1) n, for ll n 0; (2) For ny b A, if n b, for ll n 0, then b. A prtilly ordered set A, is n ω-chin complete poset iff it hs lest element, nd iff every ω-chin hs lest upper bound denoted s n. Remrk: The ω in ω-chin mens tht we re considering countble chins (ω is the ordinl ssocited with the order-type of the set of nturl numbers). This nottion my seem rcne, but is stndrd in denottionl semntics. For exmple, given ny set X, the power set 2 X ordered by inclusion is n ω-chin complete poset with lest element. The Crtesin product } 2 X 2 {{ X } ordered such n tht (A 1,...,A n ) (B 1,...,B n ) iff A i B i (where A i,b i 2 X ) is n ω-chin complete poset with lest element (,..., ). We re interested in functions between prtilly ordered sets. Definition 6.8. Given ny two prtilly ordered sets A 1, 1 nd A 2, 2, function f: A 1 A 2 is monotonic iff for ll x,y A 1, x 1 y implies tht f(x) 2 f(y). If A 1, 1 nd A 2, 2 re ω-chin complete posets, function f: A 1 A 2 is ω-continuous iff it is monotonic, nd for every ω-chin ( n ), f( n ) = f( n ).

146 146 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Remrk: Note tht we re not requiring tht n ω-continuous function f: A 1 A 2 preserve lest elements, i.e., it is possible tht f( 1 ) 2. We now define the crucil concept of lest fixed-point. Definition 6.9. Let A, be prtilly ordered set, nd let f: A A be function. A fixed-point of f is n element A such tht f() =. The lest fixed-point of f is n element A such tht f() =, nd for every b A such tht f(b) = b, then b. The following proposition gives sufficient conditions for the existence of lest fixed-points. It is one of the key propositions in denottionl semntics. Proposition 6.7. Let A, be n ω-chin complete poset with lest element. Every ω-continuous function f: A A hs unique lest fixed-point x 0 given by x 0 = f n ( ). Furthermore, for ny b A such tht f(b) b, then x 0 b. Proof. First, we prove tht the sequence,f( ),f 2 ( ),..., f n ( ),... is n ω-chin. This is shown by induction on n. Since is the lest element of A, we hve f( ). Assuming by induction tht f n ( ) f n+1 ( ), since f is ω-continuous, it is monotonic, nd thus we get f n+1 ( ) f n+2 ( ), s desired. Since A is n ω-chin complete poset, the ω-chin (f n ( )) hs lest upper bound Since f is ω-continuous, we hve nd x 0 is indeed fixed-point of f. x 0 = f n ( ). f(x 0 ) = f( f n ( )) = f(f n ( )) = f n+1 ( ) = x 0, Clerly, if f(b) b implies tht x 0 b, then f(b) = b implies tht x 0 b. Thus, ssume tht f(b) b for some b A. We prove by induction of n tht f n ( ) b. Indeed, b, since is the lest element of A. Assuming by induction tht f n ( ) b, by monotonicity of f, we get f(f n ( )) f(b), nd since f(b) b, this yields Since f n ( ) b for ll n 0, we hve f n+1 ( ) b. x 0 = f n ( ) b.

147 6.8. CONTEXT-FREE LANGUAGES AS LEAST FIXED-POINTS 147 The second prt of Proposition 6.7 is very useful to prove tht functions hve the sme lest fixed-point. For exmple, under the conditions of Proposition 6.7, if g: A A is nother ω-chin continuous function, letting x 0 be the lest fixed-point of f nd y 0 be the lest fixed-point of g, if f(y 0 ) y 0 nd g(x 0 ) x 0, we cn deduce tht x 0 = y 0. Indeed, since f(y 0 ) y 0 nd x 0 is the lest fixed-point of f, we get x 0 y 0, nd since g(x 0 ) x 0 nd y 0 is the lest fixed-point of g, we get y 0 x 0, nd therefore x 0 = y 0. Proposition 6.7 lso shows tht the lest fixed-point x 0 of f cn be pproximted s much s desired, using the sequence (f n ( )). We will now pply this fct to context-free grmmrs. For this, we need to show how context-free grmmr G = (V,Σ,P,S) with m nonterminls induces n ω-continuous mp Φ G : } 2 Σ 2 {{ Σ } 2 } Σ 2 {{ Σ }. m m 6.8 Context-Free Lnguges s Lest Fixed-Points Given context-free grmmr G = (V,Σ,P,S) with m nonterminls A 1,...A m, grouping ll the productions hving the sme left-hnd side, the grmmr G cn be concisely written s A 1 α 1,1 + +α 1,n1, A i α i,1 + +α i,ni, A m α m,1 + +α m,nn. Given ny set A, let P fin (A) be the set of finite subsets of A. Definition Let G = (V,Σ,P,S) be context-free grmmr with m nonterminls A 1,..., A m. For ny m-tuple Λ = (L 1,...,L m ) of lnguges L i Σ, we define the function inductively s follows: Φ[Λ]( ) =, Φ[Λ]({ǫ}) = {ǫ}, Φ[Λ]({}) = {}, if Σ, Φ[Λ]({A i }) = L i, if A i N, Φ[Λ]: P fin (V ) 2 Σ Φ[Λ]({αX}) = Φ[Λ]({α})Φ[Λ]({X}), if α V +, X V, Φ[Λ](Q {α}) = Φ[Λ](Q) Φ[Λ]({α}), if Q P fin (V ), Q, α V, α / Q.

148 148 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Then, writing the grmmr G s A 1 α 1,1 + +α 1,n1, A i α i,1 + +α i,ni, A m α m,1 + +α m,nn, we define the mp such tht Φ G : 2 Σ 2 Σ }{{} m 2 Σ 2 Σ }{{} m Φ G (L 1,...L m ) = (Φ[Λ]({α 1,1,...,α 1,n1 }),...,Φ[Λ]({α m,1,...,α m,nm })) for ll Λ = (L 1,...,L m ) 2 } Σ 2 {{ Σ }. m One should verify tht the mp Φ[Λ] is well defined, but this is esy. The following proposition is esily shown: Proposition 6.8. Given context-free grmmr G = (V,Σ,P,S) with m nonterminls A 1,..., A m, the mp Φ G : } 2 Σ 2 {{ Σ } 2 } Σ 2 {{ Σ } m m is ω-continuous. Now, } 2 Σ 2 {{ Σ } isnω-chincomplete poset, ndthempφ G isω-continous. Thus, m by Proposition 6.7, the mp Φ G hs lest-fixed point. It turns out tht the components of this lest fixed-point re precisely the lnguges generted by the grmmrs (V,Σ,P,A i ). Before proving this fct, let us give n exmple illustrting it. Exmple. Consider the grmmr G = ({A,B,,b},{,b},P,A) defined by the rules A BB +b, B Bb+b. The lest fixed-point of Φ G is the lest upper bound of the chin (Φ n G (, )) = ((Φn G,A (, ),Φn G,B (, )), where Φ 0 G,A (, ) = Φ0 G,B (, ) =,

149 6.8. CONTEXT-FREE LANGUAGES AS LEAST FIXED-POINTS 149 nd It is esy to verify tht Φ n+1 G,A (, ) = Φn G,B(, )Φ n G,B(, ) {b}, Φ n+1 G,B (, ) = Φn G,B (, )b {b}. Φ 1 G,A (, ) = {b}, Φ 1 G,B (, ) = {b}, Φ 2 G,A (, ) = {b,bb}, Φ 2 G,B (, ) = {b,bb}, Φ 3 G,A (, ) = {b,bb,bbb,bbb,bbbb}, Φ 3 G,B (, ) = {b,bb,bbb}. By induction, we cn esily prove tht the two components of the lest fixed-point re the lnguges L A = { m b m n b n m,n 1} {b} nd L B = { n b n n 1}. Letting G A = ({A,B,,b},{,b},P,A) nd G B = ({A,B,,b},{,b},P,B), it is indeed true tht L A = L(G A ) nd L B = L(G B ). We hve the following theorem due to Ginsburg nd Rice: Theorem 6.9. Given context-free grmmr G = (V,Σ,P,S) with m nonterminls A 1,..., A m, the lest fixed-point of the mp Φ G is the m-tuple of lnguges where G Ai = (V,Σ,P,A i ). Proof. Writing G s (L(G A1 ),...,L(G Am )), A 1 α 1,1 + +α 1,n1, A i α i,1 + +α i,ni, A m α m,1 + +α m,nn, let M = mx{ α i,j } be the mximum length of right-hnd sides of rules in P. Let Φ n G (,..., ) = (Φn G,1 (,..., ),...,Φn G,m (,..., )).

150 150 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Then, for ny w Σ, observe tht w Φ 1 G,i(,..., ) iff there is some rule A i α i,j with w = α i,j, nd tht w Φ n G,i (,..., ) for some n 2 iff there is some rule A i α i,j with α i,j of the form α i,j = u 1 A j1 u 2 u k A jk u k+1, where u 1,...,u k+1 Σ, k 1, nd some w 1,...,w k Σ such tht w h Φ n 1 G,j h (,..., ), nd w = u 1 w 1 u 2 u k w k u k+1. We prove the following two clims. Clim 1: For every w Σ n, if A i = w, then w Φ p G,i (,..., ), for some p 1. Clim 2: For every w Σ, if w Φ n G,i (,..., ), with n 1, then A i p (M +1) n 1. p = w for some Proof of Clim 1. We proceed by induction on n. If A i 1 = w, then w = α i,j for some rule A α i,j, nd by the remrk just before the clim, w Φ 1 G,i (,..., ). If A i n+1 = w with n 1, then for some rule A i α i,j. If A i n = α i,j = w α i,j = u 1 A j1 u 2 u k A jk u k+1, where u 1,...,u k+1 Σ n, k 1, then A h jh = wh, where n h n, nd w = u 1 w 1 u 2 u k w k u k+1 for some w 1,...,w k Σ. By the induction hypothesis, w h Φ p h G,jh (,..., ), for some p h 1, for every h, 1 h k. Letting p = mx{p 1,...,p k }, since ech sequence (Φ q G,i (,..., )) is n ω-chin, we hve w h Φ p G,j h (,..., ) for every h, 1 h k, nd by the remrk just before the clim, w Φ p+1 G,i (,..., ).

151 6.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 151 Proof of Clim 2. We proceed by induction on n. If w Φ 1 G,i (,..., ), by the remrk just 1 before the clim, then w = α i,j for some rule A α i,j, nd A i = w. If w Φ n G,i (,..., ) for some n 2, then there is some rule A i α i,j with α i,j of the form α i,j = u 1 A j1 u 2 u k A jk u k+1, where u 1,...,u k+1 Σ, k 1, nd some w 1,...,w k Σ such tht w h Φ n 1 G,j h (,..., ), nd w = u 1 w 1 u 2 u k w k u k+1. By the induction hypothesis, A jh so tht A i p = w with since k M. p h = wh with p h (M +1) n 2, nd thus p 1 p A i = u 1 A j1 u 2 u k A jk u k+1 = = k w, p p 1 + +p k +1 M(M +1) n 2 +1 (M +1) n 1, Combining Clim 1 nd Clim 2, we hve L(G Ai ) = n Φ n G,i(,..., ), which proves tht the lest fixed-point of the mp Φ G is the m-tuple of lnguges (L(G A1 ),...,L(G Am )). We now show how theorem 6.9 cn be used to give short proof tht every context-free grmmr cn be converted to Greibch Norml Form. 6.9 Lest Fixed-Points nd the Greibch Norml Form The hrd prt in converting grmmr G = (V,Σ,P,S) to Greibch Norml Form is to convert it to grmmr in so-clled wek Greibch Norml Form, where the productions re of the form A α, or S ǫ,

152 152 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES where Σ, α V, nd if S ǫ is rule, then S does not occur on the right-hnd side of ny rule. Indeed, if we first convert G to Chomsky Norml Form, it turns out tht we will get rules of the form A BC, A B or A. Using the lgorithm for eliminting ǫ-rules nd chin rules, we cn first convert the originl grmmr to grmmr with no chin rules nd no ǫ-rules except possibly S ǫ, in which cse, S does not pper on the right-hnd side of rules. Thus, for the purpose of converting to wek Greibch Norml Form, we cn ssume tht we re deling with grmmrs without chin rules nd without ǫ-rules. Let us lso ssume tht we computed the set T(G) of nonterminls tht ctully derive some terminl string, nd tht useless productions involving symbols not in T(G) hve been deleted. Let us explin the ide of the conversion using the following grmmr: A AB +BB +b. B Bd+BA+A+c. The first step is to group the right-hnd sides α into two ctegories: those whose leftmost symbol is terminl (α ΣV ) nd those whose leftmost symbol is nonterminl (α NV ). It is lso convenient to dopt mtrix nottion, nd we cn write the bove grmmr s ( B ) (A,B) = (A,B) B {d, A} +(b,{a,c}) Thus, we re deling with mtrices (nd row vectors) whose entries re finite subsets of V. For nottionl simplicity, brces round singleton sets re omitted. The finite subsets of V form semiring, where ddition is union, nd multipliction is conctention. Addition nd multipliction of mtrices re s usul, except tht the semiring opertions re used. We will lso consider mtrices whose entries re lnguges over Σ. Agin, the lnguges over Σ form semiring, where ddition is union, nd multipliction is conctention. The identity element for ddition is, nd the identity element for multipliction is {ǫ}. As bove, ddition nd multipliction of mtrices re s usul, except tht the semiring opertions re used. For exmple, given ny lnguges A i,j nd B i,j over Σ, where i,j {1,2}, we hve ( A1,1 A 1,2 )( B1,1 B 1,2 ) ( A1,1 B 1,1 A 1,2 B 2,1 A 1,1 B 1,2 A 1,2 B 2,2 ) A 2,1 A 2,2 B 2,1 B 2,2 = A 2,1 B 1,1 A 2,2 B 2,1 A 2,1 B 1,2 A 2,2 B 2,2 Letting X = (A,B), K = (b,{a,c}), nd ( B ) H = B {d, A}

153 6.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 153 the bove grmmr cn be concisely written s X = XH +K. More generlly, given ny context-free grmmr G = (V,Σ,P,S) with m nonterminls A 1,..., A m, ssuming tht there re no chin rules, no ǫ-rules, nd tht every nonterminl belongs to T(G), letting X = (A 1,...,A m ), we cn write G s X = XH +K, for some pproprite m m mtrix H in which every entry contins set (possibly empty) of strings in V +, nd some row vector K in which every entry contins set (possibly empty) of strings α ech beginning with terminl (α ΣV ). Given n m m squre mtrix A = (A i,j ) of lnguges over Σ, we cn define the mtrix A whose entry A i,j is given by A i,j = n 0A n i,j, where A 0 = Id m, the identity mtrix, nd A n is the n-th power of A. Similrly, we define A + where A + i,j = n 1A n i,j. Given mtrix A where the entries re finite subset of V, where N = {A 1,...,A m }, for ny m-tuple Λ = (L 1,...,L m ) of lnguges over Σ, we let Φ[Λ](A) = (Φ[Λ](A i,j )). Given system X = XH +K where H is n m m mtrix nd X,K re row mtrices, if H nd K do not contin ny nonterminls, we clim tht the lest fixed-point of the grmmr G ssocited with X = XH + K is KH. This is esily seen by computing the pproximtions X n = Φ n G (,..., ). Indeed, X0 = K, nd X n = KH n +KH n 1 + +KH +K = K(H n +H n 1 + +H +I m ). Similrly, if Y is n m m mtrix of nonterminls, the lest fixed-point of the grmmr ssocited with Y = HY +H is H + (provided tht H does not contin ny nonterminls). Given ny context-free grmmr G = (V,Σ,P,S) with m nonterminls A 1,..., A m, writing G s X = XH + K s explined erlier, we cn form nother grmmr GH by creting m 2 new nonterminls Y i,j, where the rules of this new grmmr re defined by the system of two mtrix equtions X = KY +K, Y = HY +H,

154 154 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES where Y = (Y i,j ). The following proposition is the key to the Greibch Norml Form. Proposition Given ny context-free grmmr G = (V, Σ, P, S) with m nonterminls A 1,..., A m, writing G s X = XH +K s explined erlier, if GH is the grmmr defined by the system of two mtrix equtions X = KY +K, Y = HY +H, s explined bove, then the components in X of the lest-fixed points of the mps Φ G nd Φ GH re equl. Proof. Let U be the lest-fixed point of Φ G, nd let (V,W) be the lest fixed-point of Φ GH. We shll prove tht U = V. For nottionl simplicity, let us denote Φ[U](H) s H[U] nd Φ[U](K) s K[U]. Since U is the lest fixed-point of X = XH +K, we hve U = UH[U]+K[U]. Since H[U] nd K[U] do not contin ny nonterminls, by previous remrk, K[U]H [U] is the lest-fixed point of X = XH[U]+K[U], nd thus, K[U]H [U] U. On the other hnd, by monotonicity, [ ] [ ] K[U]H [U]H K[U]H [U] +K K[U]H [U] K[U]H [U]H[U]+K[U] = K[U]H [U], nd since U is the lest fixed-point of X = XH +K, U K[U]H [U]. Therefore, U = K[U]H [U]. We cn prove in similr mnner tht W = H[V] +. nd Let Z = H[U] +. We hve K[U]Z +K[U] = K[U]H[U] + +K[U] = K[U]H[U] = U, H[U]Z +H[U] = H[U]H[U] + +H[U] = H[U] + = Z, nd since (V,W) is the lest fixed-point of X = KY +K nd Y = HY +H, we get V U nd W H[U] +.

155 6.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 155 nd We lso hve V = K[V]W +K[V] = K[V]H[V] + +K[V] = K[V]H[V], VH[V]+K[V] = K[V]H[V] H[V]+K[V] = K[V]H[V] = V, nd since U is the lest fixed-point of X = XH +K, we get U V. Therefore, U = V, s climed. Note tht the bove proposition ctully pplies to ny grmmr. Applying Proposition 6.10 to our exmple grmmr, we get the following new grmmr: ( ) Y1 Y 2 (A,B) = (b,{a,c}) +(b,{a,c}), Y 3 Y 4 ( ) ( )( ) ( ) Y1 Y 2 B Y1 Y 2 B = + Y 3 Y 4 B {d, A} Y 3 Y 4 B {d, A} There re still some nonterminls ppering s leftmost symbols, but using the equtions defining A nd B, we cn replce A with nd B with {by 1,AY 3,cY 3,b} {by 2,AY 4,cY 4,A,c}, obtining system in wek Greibch Norml Form. This mounts to converting the mtrix to the mtrix L = ( B H = ( B B {d, A} {by 2,AY 4,cY 4,A,c} {d,by 1,AY 3,cY 3,b} The wek Greibch Norml Form corresponds to the new system X = KY +K, Y = LY +L. ) )

156 156 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES This method works in generl for ny input grmmr with no ǫ-rules, no chin rules, nd such tht every nonterminl belongs to T(G). Under these conditions, the row vector K contins some nonempty entry, ll strings in K re in ΣV, nd ll strings in H re in V +. After obtining the grmmr GH defined by the system X = KY +K, Y = HY +H, we use the system X = KY + K to express every nonterminl A i in terms of expressions contining strings α i,j involving terminl s the leftmost symbol (α i,j ΣV ), nd we replce ll leftmost occurrences of nonterminls in H (occurrences A i in strings of the form A i β, where β V ) using the bove expressions. In this fshion, we obtin mtrix L, nd it is immeditely shown tht the system X = KY +K, Y = LY +L, genertes the sme tuple of lnguges. Furthermore, this lst system corresponds to wek Greibch Norml Form. It we strt with grmmr in Chomsky Norml Form (with no production S ǫ) such tht every nonterminl belongs to T(G), we ctully get Greibch Norml Form (the entries in K re terminls, nd the entries in H re nonterminls). Thus, we hve justified Proposition 6.6. The method is lso quite economicl, since it introduces only m 2 new nonterminls. However, the resulting grmmr my contin some useless nonterminls Tree Domins nd Gorn Trees Derivtion trees ply very importnt role in prsing theory nd in the proof of strong version of the pumping lemm for the context-free lnguges known s Ogden s lemm. Thus, it is importnt to define derivtion trees rigorously. We do so using Gorn trees. Let N + = {1,2,3,...}. Definition A tree domin D is nonempty subset of strings in N + conditions: stisfying the (1) For ll u,v N +, if uv D, then u D. (2) For ll u N +, for every i N +, if ui D then uj D for every j, 1 j i. The tree domin is represented s follows: D = {ǫ,1,2,11,21,22,221,222,2211}

157 6.10. TREE DOMAINS AND GORN TREES 157 ւ ǫ ց 1 2 ւ ւ ց ւ ց A tree lbeled with symbols from set is defined s follows. Definition Given set of lbels, -tree (for short, tree) is totl function t : D, where D is tree domin. The domin of tree t is denoted s dom(t). Every string u dom(t) is clled tree ddress or node. Let = {f,g,h,,b}. The tree t : D, where D is the tree domin of the previous exmple nd t is the function whose grph is {(ǫ,f),(1,h),(2,g),(11,),(21,),(22,f),(221,h),(222,b),(2211,)} is represented s follows: f ւ ց h g ւ ւ ց f ւ h ց b The outdegree (sometimes clled rmifiction) r(u) of node u is the crdinlity of the set {i ui dom(t)}.

158 158 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Note tht the outdegree of node cn be infinite. Most of the trees tht we shll consider will be finite-brnching, tht is, for every node u, r(u) will be n integer, nd hence finite. If the outdegree of ll nodes in tree is bounded by n, then we cn view the domin of the tree s being defined over {1,2,...,n}. A node of outdegree 0 is clled lef. The node whose ddress is ǫ is clled the root of the tree. A tree is finite if its domin dom(t) is finite. Given node u in dom(t), every node of the form ui in dom(t) with i N + is clled son (or immedite successor) of u. Tree ddresses re totlly ordered lexicogrphiclly: u v if either u is prefix of v or, there exist strings x,y,z N + nd i,j N +, with i < j, such tht u = xiy nd v = xjz. In the first cse, we sy tht u is n ncestor (or predecessor) of v (or u domintes v) nd in the second cse, tht u is to the left of v. If y = ǫ nd z = ǫ, we sy tht xi is left brother (or left sibling) of xj, (i < j). Two tree ddresses u nd v re independent if u is not prefix of v nd v is not prefix of u. Given finite tree t, the yield of t is the string t(u 1 )t(u 2 ) t(u k ), where u 1,u 2,...,u k is the sequence of leves of t in lexicogrphic order. For exmple, the yield of the tree below is b: f ւ ց h g ւ ւ ց f ւ h Given finite tree t, the depth of t is the integer ց b d(t) = mx{ u u dom(t)}. Given tree t nd node u in dom(t), the subtree rooted t u is the tree t/u, whose domin is the set {v uv dom(t)} nd such tht t/u(v) = t(uv) for ll v in dom(t/u). Another importnt opertion is the opertion of tree replcement (or tree substitution).

159 6.10. TREE DOMAINS AND GORN TREES 159 Definition Giventwotreest 1 ndt 2 ndtreeddressuint 1,theresultof substituting t 2 t u in t 1, denoted by t 1 [u t 2 ], is the function whose grph is the set of pirs {(v,t 1 (v)) v dom(t 1 ), u is not prefix of v} {(uv,t 2 (v)) v dom(t 2 )}. Let t 1 nd t 2 be the trees defined by the following digrms: Tree t 1 f ւ ց h g ւ ւ ց f ւ h ց b Tree t 2 ւ g ց b The tree t 1 [22 t 2 ] is defined by the following digrm: f ւ ց h g ւ ւ ց g ւ ց b We cn now define derivtion trees nd relte derivtions to derivtion trees.

160 160 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES 6.11 Derivtions Trees Definition Given context-free grmmr G = (V,Σ,P,S), for ny A N, n A- derivtion tree for G is (V {ǫ})-tree t ( tree with set of lbels (V {ǫ})) such tht: (1) t(ǫ) = A; (2) For every nonlef node u dom(t), if u1,...,uk re the successors of u, then either there is production B X 1 X k in P such tht t(u) = B nd t(ui) = X i for ll i, 1 i k, or B ǫ P, t(u) = B nd t(u1) = ǫ. A complete derivtion (or prse tree) is n S-tree whose yield belongs to Σ. A derivtion tree for the grmmr where P is the set of rules G 3 = ({E,T,F,+,,(,),},{+,,(,),},P,E), E E +T, E T, T T F, T F, F (E), F, is shown in Figure 6.1. The yield of the derivtion tree is +. E E T T + T F F F Figure 6.1: A complete derivtion tree Derivtions trees re ssocited to derivtions inductively s follows. Definition Given context-free grmmr G = (V,Σ,P,S), for ny A N, if π : A = n α is derivtion in G, we construct n A-derivtion tree t π with yield α s follows.

161 6.11. DERIVATIONS TREES 161 (1) If n = 0, then t π is the one-node tree such tht dom(t π ) = {ǫ} nd t π (ǫ) = A. (2) IfA = n 1 λbρ = λγρ = α, thenift 1 isthea-derivtiontreewithyield λbρssocited with the derivtion A = n 1 λbρ, nd if t 2 is the tree ssocited with the production B γ (tht is, if γ = X 1 X k, then dom(t 2 ) = {ǫ,1,...,k}, t 2 (ǫ) = B, nd t 2 (i) = X i for ll i, 1 i k, or if γ = ǫ, then dom(t 2 ) = {ǫ,1}, t 2 (ǫ) = B, nd t 2 (1) = ǫ), then t π = t 1 [u t 2 ], where u is the ddress of the lef lbeled B in t 1. The tree t π is the A-derivtion tree ssocited with the derivtion A n = α. Given the grmmr where P is the set of rules G 2 = ({E,+,,(,),},{+,,(,),},P,E), E E +E, E E E, E (E), E, the prse trees ssocited with two derivtions of the string + re shown in Figure 6.2: E E E E + E E E E + E E Figure 6.2: Two derivtion trees for + The following proposition is esily shown. Proposition Let G = (V, Σ, P, S) be context-free grmmr. For ny derivtion A = n α, there is unique A-derivtion tree ssocited with this derivtion, with yield α. Conversely, for ny A-derivtion tree t with yield α, there is unique leftmost derivtion A = α in G hving t s its ssocited derivtion tree. lm We will now prove strong version of the pumping lemm for context-free lnguges due to Bill Ogden (1968).

162 162 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES 6.12 Ogden s Lemm Ogden s lemm sttes some combintoril properties of prse trees tht re deep enough. The yield w of such prse tree cn be split into 5 substrings u,v,x,y,z such tht w = uvxyz, where u,v,x,y,z stisfy certin conditions. It turns out tht we get more powerful version of the lemm if we llow ourselves to mrk certin occurrences of symbols in w before invoking the lemm. We cn imgine tht mrked occurrences in nonempty string w re occurrences of symbols in w in boldfce, or red, or ny given color (but one color only). For exmple, given w = bbbb, we cn mrk the symbols of even index s follows: bbbb. More rigorously, we cn define mrking of nonnull string w: {1,...,n} Σ s ny function m: {1,...,n} {0,1}. Then, letter w i in w is mrked occurrence iff m(i) = 1, nd n unmrked occurrence if m(i) = 0. The number of mrked occurrences in w is equl to n m(i). i=1 Ogden s lemm only yields useful informtion for grmmrs G generting n infinite lnguge. We could mke this hypothesis, but it seems more elegnt to use the precondition tht the lemm only pplies to strings w L(D) such tht w contins t lest K mrked occurrences, for constnt K lrge enough. If K is lrge enough, L(G) will indeed be infinite. Proposition For every context-free grmmr G, there is some integer K > 1 such tht, for every string w Σ +, for every mrking of w, if w L(G) nd w contins t lest K mrked occurrences, then there exists some decomposition of w s w = uvxyz, nd some A N, such tht the following properties hold: (1) There re derivtions S + = uaz, A + = vay, nd A + = x, so tht for ll n 0 (the pumping property); (2) x contins some mrked occurrence; uv n xy n z L(G) (3) Either (both u nd v contin some mrked occurrence), or (both y nd z contin some mrked occurrence); (4) vxy contins less thn K mrked occurrences.

163 6.12. OGDEN S LEMMA 163 Proof. Let t be ny prse tree for w. We cll lef of t mrked lef if its lbel is mrked occurrence in the mrked string w. The generl ide is to mke sure tht K is lrge enough so tht prse trees with yield w contin enough repeted nonterminls long some pth from the root to some mrked lef. Let r = N, nd let We clim tht K = p 2r+3 does the job. p = mx{2, mx{ α (A α) P}}. The key concept in the proof is the notion of B-node. Given prse tree t, B-node is node with t lest two immedite successors u 1,u 2, such tht for i = 1,2, either u i is mrked lef, or u i hs some mrked lef s descendnt. We construct pth from the root to some mrked lef, so tht for every B-node, we pick the leftmost successor with the mximum number of mrked leves s descendnts. Formlly, define pth (s 0,...,s n ) from the root to some mrked lef, so tht: (i) Every node s i hs some mrked lef s descendnt, nd s 0 is the root of t; (ii) If s j is in the pth, s j is not lef, nd s j hs single immedite descendnt which is either mrked lef or hs mrked leves s its descendnts, let s j+1 be tht unique immedite descendnt of s i. (iii) If s j is B-node in the pth, then let s j+1 be the leftmost immedite successors of s j with the mximum number of mrked leves s descendnts (ssuming tht if s j+1 is mrked lef, then it is its own descendnt). (iv) If s j is lef, then it is mrked lef nd n = j. We will show tht the pth (s 0,...,s n ) contins t lest 2r +3 B-nodes. Clim: For every i, 0 i n, if the pth (s i,...,s n ) contins b B-nodes, then s i hs t most p b mrked leves s descendnts. Proof. We proceed by bckwrd induction, i.e., by induction on n i. For i = n, there re no B-nodes, so tht b = 0, nd there is indeed p 0 = 1 mrked lef s n. Assume tht the clim holds for the pth (s i+1,...,s n ). If s i is not B-node, then the number b of B-nodes in the pth (s i+1,...,s n ) is the sme s the number of B-nodes in the pth (s i,...,s n ), nd s i+1 is the only immedite successor of s i hving mrked lef s descendnt. By the induction hypothesis, s i+1 hs t most p b mrked leves s descendnts, nd this is lso n upper bound on the number of mrked leves which re descendnts of s i. If s i is B-node, then if there re b B-nodes in the pth (s i+1,...,s n ), there re b+1 B-nodes in the pth (s i,...,s n ). By the induction hypothesis, s i+1 hs t most p b mrked leves s descendnts. Since s i is B-node, s i+1 ws chosen to be the leftmost immedite successor of s i hving the mximum number of mrked leves s descendnts. Thus, since

164 164 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES the outdegree of s i is t most p, nd ech of its immedite successors hs t most p b mrked leves s descendnts, the node s i hs t most pp d = p d+1 mrked leves s descendnts, s desired. Applying the clim to s 0, since w hs t lest K = p 2r+3 mrked occurrences, we hve p b p 2r+3, nd since p 2, we hve b 2r +3, nd the pth (s 0,...,s n ) contins t lest 2r +3 B-nodes (Note tht this would not follow if we hd p = 1). Let us now select the lowest 2r +3 B-nodes in the pth, (s 0,...,s n ), nd denote them (b 1,...,b 2r+3 ). Every B-node b i hs t lest two immedite successors u i < v i such tht u i or v i is on the pth (s 0,...,s n ). If the pth goes through u i, we sy tht b i is right B-node nd if the pth goes through v i, we sy tht b i is left B-node. Since 2r+3 = r+2+r+1, either there re r+2 left B-nodes or there re r+2 right B-nodes in the pth (b 1,...,b 2r+3 ). Let us ssume tht there re r +2 left B-nodes, the other cse being similr. Let (d 1,...,d r+2 ) be the lowest r + 2 left B-nodes in the pth. Since there re r + 1 B-nodes in the sequence (d 2,...,d r+2 ), nd there re only r distinct nonterminls, there re two nodes d i nd d j, with 2 i < j r +2, such tht t(d i ) = t(d j ) = A, for some A N. We cn ssume tht d i is n ncestor of d j, nd thus, d j = d i α, for some α ǫ. If we prune out the subtree t/d i rooted t d i from t, we get n S-derivtion tree hving yield of the form uaz, nd we hve derivtion of the form S = + uaz, since there re t lest r+2 left B-nodes on the pth, nd we re looking t the lowest r+1 left B-nodes. Considering the subtree t/d i, pruning out the subtree t/d j rooted t α in t/d i, we get n A-derivtion tree hving yield of the form vay, nd we hve derivtion of the form A = + vay. Finlly, the subtree t/d j is n A-derivtion tree with yield x, nd we hve derivtion A = + x. This proves (1) of the lemm. Since s n is mrked lef nd descendnt of d j, x contins some mrked occurrence, proving (2). Since d 1 is left B-node, some left sibbling of the immedite successor of d 1 on the pth hs some distinguished lef in u s descendnt. Similrly, since d i is left B-node, some left sibbling of the immedite successor of d i on the pth hs some distinguished lef in v s descendnt. This proves (3). (d j,...,b 2r+3 ) hs t most 2r+1 B-nodes, nd by the clim shown erlier, d j hs t most p 2r+1 mrked leves s descendnts. Since p 2r+1 < p 2r+3 = K, this proves (4). Observe tht condition (2) implies tht x ǫ, nd condition (3) implies tht either u ǫ nd v ǫ, or y ǫ nd z ǫ. Thus, the pumping condition (1) implies tht the set {uv n xy n z n 0} is n infinite subset of L(G), nd L(G) is indeed infinite, s we mentioned erlier. Note tht K 3, nd in fct, K 32. The stndrd pumping lemm due to Br-Hillel, Perles, nd Shmir, is obtined by letting ll occurrences be mrked in w L(G).

165 6.12. OGDEN S LEMMA 165 Proposition For every context-free grmmr G (without ǫ-rules), there is some integer K > 1 such tht, for every string w Σ +, if w L(G) nd w K, then there exists some decomposition of w s w = uvxyz, nd some A N, such tht the following properties hold: (1) There re derivtions S + = uaz, A + = vay, nd A + = x, so tht for ll n 0 (the pumping property); (2) x ǫ; (3) Either v ǫ or y ǫ; (4) vxy K. uv n xy n z L(G) A stronger version could be stted, nd we re just following trdition in stting this stndrd version of the pumping lemm. Ogden s lemm or the pumping lemm cn be used to show tht certin lnguges re not context-free. The method is to proceed by contrdiction, i.e., to ssume (contrry to wht we wish to prove) tht lnguge L is indeed context-free, nd derive contrdiction of Ogden s lemm (or of the pumping lemm). Thus, s in the cse of the regulr lnguges, it would be helpful to see wht the negtion of Ogden s lemm is, nd for this, we first stte Ogden s lemm s logicl formul. For ny nonnull string w: {1,...,n} Σ, for ny mrking m: {1,...,n} {0,1} of w, for ny substring y of w, where w = xyz, with x = h nd k = y, the number of mrked occurrences in y, denoted s m(y), is defined s m(y) = We will lso use the following bbrevitions: nt = {0,1,2,...}, nt32 = {32,33,...}, A w = uvxyz, B m(x) 1, i=h+k i=h+1 m(i). C ( m(u) 1 m(v) 1) ( m(y) 1 m(z) 1), D m(vxy) < K, P n: nt(uv n xy n z L(D)).

166 166 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Ogden s lemm cn then be stted s G: CFG K: nt32 w: Σ m: mrking ( ) (w L(D) m(w) K) ( u,v,x,y,z: Σ A B C D P). Reclling tht nd (A B C D P) (A B C D) P (A B C D) P (P Q) P Q, the negtion of Ogden s lemm cn be stted s G: CFG K: nt32 w: Σ m: mrking ( ) (w L(D) m(w) K) ( u,v,x,y,z: Σ (A B C D) P). Since P n: nt(uv n xy n z / L(D)), in order to show tht Ogden s lemm is contrdicted, one needs to show tht for some context-free grmmr G, for every K 2, there is some string w L(D) nd some mrking m of w with t lest K mrked occurrences in w, such tht for every possible decomposition w = uvxyz stisfying the constrints A B C D, there is some n 0 such tht uv n xy n z / L(D). When proceeding by contrdiction, we hve lnguge L tht we re (wrongly) ssuming to be context-free nd we cn use ny CFG grmmr G generting L. The cretive prt of the rgument is to pick the right w L nd the right mrking of w (not mking ny ssumption on K). As n illustrtion, we show tht the lnguge L = { n b n c n n 1} is not context-free. Since L is infinite, we will be ble to use the pumping lemm. The proof proceeds by contrdiction. If L ws context-free, there would be some contextfree grmmr G such tht L = L(G), nd some constnt K > 1 s in Ogden s lemm. Let w = K b K c K, nd choose the b s s mrked occurrences. Then by Ogden s lemm, x contins some mrked occurrence, nd either both u,v or both y,z contin some mrked occurrence. Assume tht both u nd v contin some b. We hve the following sitution: b b }{{} b b }{{} b bc c }{{}. u v xyz

167 6.12. OGDEN S LEMMA 167 Ifweconsider thestringuvvxyyz,thenumber of sisstillk, butthenumber ofb sisstrictly greter thn K since v contins t lest one b, nd thus uvvxyyz / L, contrdiction. If both y nd z contin some b we will lso rech contrdiction becuse in the string uvvxyyz, the number of c s is still K, but the number of b s is strictly greter thn K. Hving reched contrdiction in ll cses, we conclude tht L is not context-free. Let us now show tht the lnguge is not context-free. L = { m b n c m d n m,n 1} Agin, we proceed by contrdiction. This time, let where the b s nd c s re mrked occurrences. w = K b K c K d K, By Ogden s lemm, either both u,v contin some mrked occurrence, or bothy,z contin some mrked occurrence, nd x contins some mrked occurrence. Let us first consider the cse where both u, v contin some mrked occurrence. If v contins some b, since uvvxyyz L, v must contin only b s, since otherwise we would hve bd string in L, nd we hve the following sitution: b b }{{} b b }{{} b bc cd d }{{}. u v xyz Since uvvxyyz L, the only wy to preserve n equl number of b s nd d s is to hve y d +. But then, vxy contins c K, which contrdicts (4) of Ogden s lemm. If v contins some c, since x lso contins some mrked occurrence, it must be some c, nd v contins only c s nd we hve the following sitution: b bc c }{{} c c }{{}} c cd d {{}. u v xyz Since uvvxyyz L nd the number of s is still K wheres the number of c s is strictly more thn K, this cse is impossible. Let us now consider the cse where both y, z contin some mrked occurrence. Resoning s before, the only possibility is tht v + nd y c + : }{{} }{{} b bc c }{{}} c c {{}} c cd d {{}. u v x y z But then, vxy contins b K, which contrdicts (4) of Ogden s Lemm. Since contrdiction ws obtined in ll cses, L is not context-free.

168 168 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Ogden s lemm cn lso be used to show tht the context-free lnguge { m b n c n m,n 1} { m b m c n m,n 1} is inherently mbiguous. The proof is quite involved. Another corollry of the pumping lemm is tht it is decidble whether context-free grmmr genertes n infinite lnguge. Proposition Given ny context-free grmmr, G, if K is the constnt of Ogden s lemm, then the following equivlence holds: L(G) is infinite iff there is some w L(G) such tht K w < 2K. Proof. Let K = p 2r+3 be the constnt from the proof of Proposition If there is some w L(G) such tht w K, we lredy observed tht the pumping lemm implies tht L(G) contins n infinite subset of the form {uv n xy n z n 0}. Conversely, ssume tht L(G) is infinite. If w < K for ll w L(G), then L(G) is finite. Thus, there is some w L(G) such tht w K. Let w L(G) be miniml string such tht w K. By the pumping lemm, we cn write w s w = uvxyxz, where x ǫ, vy ǫ, nd vxy K. By the pumping property, uxz L(G). If w 2K, then uxz = uvxyz vy > uvxyz vxy 2K K = K, nd uxz < uvxyz, contrdicting the minimlity of w. Thus, we must hve w < 2K. In prticulr, if G is in Chomsky Norml Form, it cn be shown tht we just hve to consider derivtions of length t most 4K Pushdown Automt We hve seen tht the regulr lnguges re exctly the lnguges ccepted by DFA s or NFA s. The context-free lnguges re exctly the lnguges ccepted by pushdown utomt, for short, PDA s. However, lthough there re two versions of PDA s, deterministic nd nondeterministic, contrry to the fct tht every NFA cn be converted to DFA, nondeterministic PDA s re strictly more poweful thn deterministic PDA s (DPDA s). Indeed, there re context-free lnguges tht cnnot be ccepted by DPDA s. Thus, the nturl mchine model for the context-free lnguges is nondeterministic, nd for this reson, we just use the bbrevition PDA, s opposed to NPDA. We dopt definition of PDA in which the pushdown store, or stck, must not be empty for move to tke plce. Other uthors llow PDA s to mke move when the stck is empty. Novices seem to be confused by such moves, nd this is why we do not llow moves with n empty stck. Intuitively, PDA consists of n input tpe, nondeterministic finite-stte control, nd stck. Given ny set X possibly infinite, let P fin (X) be the set of ll finite subsets of X.

169 6.13. PUSHDOWN AUTOMATA 169 Definition A pushdown utomton is 7-tuple M = (Q,Σ,Γ,δ,q 0,Z 0,F), where Q is finite set of sttes; Σ is finite input lphbet; Γ is finite pushdown store (or stck) lphbet; q 0 Q is the strt stte (or initil stte); Z 0 Γ is the initil stck symbol (or bottom mrker); F Q is the set of finl (or ccepting) sttes; δ: Q (Σ {ǫ}) Γ P fin (Q Γ ) is the trnsition function. A trnsition is of the form (q,γ) δ(p,,z), where p,q Q, Z Γ, γ Γ nd Σ {ǫ}. A trnsition of the form (q,γ) δ(p,ǫ,z) is clled n ǫ-trnsition (or ǫ-move). The wy PDA opertes is explined in terms of Instntneous Descriptions, for short ID s. Intuitively, n Instntneous Description is snpshot of the PDA. An ID is triple of the form (p,u,α) Q Σ Γ. The ide is tht p is the current stte, u is the remining input, nd α represents the stck. It is importnt to note tht we use the convention tht the leftmost symbol in α represents the topmost stck symbol. Given PDA M, we define reltion M between pirs of ID s. This is very similr to the derivtion reltion = G ssocited with context-free grmmr. Intuitively, PDA scns the input tpe symbol by symbol from left to right, mking moves tht cuse chnge of stte, n updte to the stck (but only t the top), nd either dvncing the reding hed to the next symbol, or not moving the reding hed during n ǫ-move. Definition Given PDA the reltion M is defined s follows: M = (Q,Σ,Γ,δ,q 0,Z 0,F), (1) For ny move (q,γ) δ(p,,z), where p,q Q, Z Γ, Σ, for every ID of the form (p,v,zα), we hve (p,v,zα) M (q,v,γα). (2) For ny move (q,γ) δ(p,ǫ,z), where p,q Q, Z Γ, for every ID of the form (p,u,zα), we hve (p,u,zα) M (q,u,γα).

170 170 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES As usul, + M is the trnsitive closure of M, nd M of M. is the reflexive nd trnsitive closure A move of the form where Σ {ǫ}, is clled pop move. (p,u,zα) M (q,u,α) A move on rel input symbol Σ cuses this input symbol to be consumed, nd the reding hed dvnces to the next input symbol. On the other hnd, during n ǫ-move, the reding hed stys put. When we sy tht we hve computtion. (p,u,α) M (q,v,β) There re severl equivlent wys of defining cceptnce by PDA. Definition Given PDA the following lnguges re defined: M = (Q,Σ,Γ,δ,q 0,Z 0,F), (1) T(M) = {w Σ (q 0,w,Z 0 ) M (f,ǫ,α), where f F, nd α Γ }. We sy tht T(M) is the lnguge ccepted by M by finl stte. (2) N(M) = {w Σ (q 0,w,Z 0 ) M (q,ǫ,ǫ), where q Q}. We sy tht N(M) is the lnguge ccepted by M by empty stck. (3) L(M) = {w Σ (q 0,w,Z 0 ) M (f,ǫ,ǫ), where f F}. We sy tht L(M) is the lnguge ccepted by M by finl stte nd empty stck. In ll cses, note tht the input w must be consumed entirely. The following proposition shows tht the cceptnce mode does not mtter for PDA s. As we will see shortly, it does mtter for DPDAs. Proposition For ny lnguge L, the following fcts hold. (1) If L = T(M) for some PDA M, then L = L(M ) for some PDA M. (2) If L = N(M) for some PDA M, then L = L(M ) for some PDA M. (3) If L = L(M) for some PDA M, then L = T(M ) for some PDA M.

171 6.13. PUSHDOWN AUTOMATA 171 (4) If L = L(M) for some PDA M, then L = N(M ) for some PDA M. In view of Proposition 6.15, the three cceptnce modes T, N, L re equivlent. The following PDA ccepts the lnguge by empty stck. Q = {1,2}, Γ = {Z 0,}; (1,) δ(1,,z 0 ), (1,) δ(1,,), (2,ǫ) δ(1,b,), (2,ǫ) δ(2,b,). The following PDA ccepts the lnguge by finl stte (nd lso by empty stck). Q = {1,2,3}, Γ = {Z 0,A,}, F = {3}; (1,A) δ(1,,z 0 ), (1,A) δ(1,,a), (1,) δ(1,,), (2,ǫ) δ(1,b,), (2,ǫ) δ(2,b,), (3,ǫ) δ(1,b,a), (3,ǫ) δ(2,b,a). DPDA s re defined s follows. Definition A PDA L = { n b n n 1} L = { n b n n 1} M = (Q,Σ,Γ,δ,q 0,Z 0,F) is deterministic PDA (for short, DPDA), iff the following conditions hold for ll (p,z) Q Γ: either (1) δ(p,,z) = 1 for ll Σ, nd δ(p,ǫ,z) =, or (2) δ(p,,z) = for ll Σ, nd δ(p,ǫ,z) = 1.

172 172 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES A DPDA opertes in reltime iff it hs no ǫ-trnsitions. It turns out tht for DPDA s the most generl cceptnce mode is by finl stte. Indeed, there re lnguge tht cn only be ccepted deterministiclly s T(M). The lnguge L = { m b n m n 1} is such n exmple. The problem is tht m b is prefix of ll strings m b n, with m n 2. A lnguge L is deterministic context-free lnguge iff L = T(M) for some DPDA M. It is esily shown tht if L = N(M) (or L = L(M)) for some DPDA M, then L = T(M ) for some DPDA M esily constructed from M. A PDA is unmbiguous iff for every w Σ, there is t most one computtion where ID n is n ccepting ID. (q 0,w,Z 0 ) ID n, There re context-free lnguges tht re not ccepted by ny DPDA. For exmple, it cn be shown tht the lnguges nd re not ccepted by ny DPDA. L 1 = { n b n n 1} { n b 2n n 1}, L 2 = {ww R w {,b} }, Also note tht unmbiguous grmmrs for these lnguges cn be esily given. We now show tht every context-free lnguge is ccepted by PDA From Context-Free Grmmrs To PDA s We show how PDA cn be esily constructed from context-free grmmr. Although simple, the construction is not prcticl for prsing purposes, since the resulting PDA is horribly nondeterministic. Given context-free grmmr G = (V,Σ,P,S), we define one-stte PDA M s follows: Q = {q 0 }; Γ = V; Z 0 = S; F = ; For every rule (A α) P, there is trnsition For every Σ, there is trnsition (q 0,α) δ(q 0,ǫ,A). (q 0,ǫ) δ(q 0,,). The intuition is tht computtion of M mimics leftmost derivtion in G. One might sy tht we hve pop/expnd PDA.

173 6.15. FROM PDA S TO CONTEXT-FREE GRAMMARS 173 Proposition Given ny context-free grmmr G = (V, Σ, P, S), the PDA M just described ccepts L(G) by empty stck, i.e., L(G) = N(M). Proof sketch. The following two clims re proved by induction. Clim 1: For ll u,v Σ nd ll α NV {ǫ}, if S = lm uα, then (q 0,uv,S) (q 0,v,α). Clim 2: For ll u,v Σ nd ll α V, if then S = lm uα. (q 0,uv,S) (q 0,v,α) We now show how PDA cn be converted to context-free grmmr 6.15 From PDA s To Context-Free Grmmrs The construction of context-free grmmr from PDA is not relly difficult, but it is quite messy. The construction is simplified if we first convert PDA to n equivlent PDA such tht for every move (q,γ) δ(p,,z) (where Σ {ǫ}), we hve γ 2. In some sense, we form kind of PDA in Chomsky Norml Form. Proposition Given ny PDA nother PDA M = (Q,Σ,Γ,δ,q 0,Z 0,F), M = (Q,Σ,Γ,δ,q 0,Z 0,F ) cn be constructed, such tht L(M) = L(M ) nd the following conditions hold: (1) There is one-to-one correspondence between ccepting computtions of M nd M ; (2) If M hs no ǫ-moves, then M hs no ǫ-moves; If M is unmbiguous, then M is unmbiguous; (3) For ll p Q, ll Σ {ǫ}, nd ll Z Γ, if (q,γ) δ (p,,z), then q q 0 nd γ 2.

174 174 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES The crucil point of the construction is tht ccepting computtions of PDA ccepting by empty stck nd finl stte cn be decomposed into subcomputtions of the form (p,uv,zα) (q,v,α), where for every intermedite ID (s,w,β), we hve β = γα for some γ ǫ. The nonterminls of the grmmr constructed from the PDA M re triples of the form [p,z,q] such tht (p,u,z) + (q,ǫ,ǫ) for some u Σ. Given PDA M = (Q,Σ,Γ,δ,q 0,Z 0,F) stisfying the conditions of Proposition 6.17, we construct context-free grmmr G = (V,Σ,P,S) s follows: V = {[p,z,q] p,q Q,Z Γ} Σ {S}, where S is new symbol, nd the productions re defined s follows: for ll p,q Q, ll Σ {ǫ}, ll X,Y,Z Γ, we hve: (1) S ǫ P, if q 0 F; (2) S P, if (f,ǫ) δ(q 0,,Z 0 ), nd f F; (3) S [p,x,f] P, for every f F, if (p,x) δ(q 0,,Z 0 ); (4) S [p,x,s][s,y,f] P, for every f F, for every s Q, if (p,xy) δ(q 0,,Z 0 ); (5) [p,z,q] P, if (q,ǫ) δ(p,,z) nd p q 0 ; (6) [p,z,s] [q,x,s] P, for every s Q, if (q,x) δ(p,,z) nd p q 0 ; (7) [p,z,t] [q,x,s][s,y,t] P, for every s,t Q, if (q,xy) δ(p,,z) nd p q 0. Proposition Given ny PDA M = (Q,Σ,Γ,δ,q 0,Z 0,F) stisfying the conditions of Proposition 6.17, the context-free grmmr G = (V,Σ,P,S) constructed s bove genertes L(M), i.e., L(G) = L(M). Furthermore, G is unmbiguous iff M is unmbiguous.

175 6.16. THE CHOMSKY-SCHUTZENBERGER THEOREM 175 Proof skecth. We hve to prove tht L(G) = {w Σ + (q 0,w,Z 0 ) + (f,ǫ,ǫ), f F} {ǫ q 0 F}. For this, the following clim is proved by induction. Clim: For ll p,q Q, ll Z Γ, ll k 1, nd ll w Σ, [p,z,q] k = lm w iff (p,w,z) + (q,ǫ,ǫ). Using the clim, it is possible to prove tht L(G) = L(M). In view of Propositions 6.16 nd 6.18, the fmily of context-free lnguges is exctly the fmily of lnguges ccepted by PDA s. It is hrder to give grmmticl chrcteriztion of the deterministic context-free lnguges. One method is to use Knuth LR(k)-grmmrs. Another chrcteriztion cn be given in terms of strict deterministic grmmrs due to Hrrison nd Hvel The Chomsky-Schutzenberger Theorem Unfortuntely, there is no chrcteriztion of the context-free lnguges nlogous to the chrcteriztion of the regulr lnguges in terms of closure properties (R(Σ)). However, there is fmous theorem due to Chomsky nd Schutzenberger showing tht every context-free lnguge cn be obtined from specil lnguge, the Dyck set, in terms of homomorphisms, inverse homomorphisms nd intersection with the regulr lnguges. Definition Giventhe lphbet Σ 2 = {,b,,b}, define the reltion onσ 2 sfollows: For ll u,v Σ 2, u v iff x,y Σ 2, u = xy, v = xy or u = xbby, v = xy. Let be the reflexive nd trnsitive closure of, nd let D 2 = {w Σ 2 w ǫ}. This is the Dyck set on two letters. It is not hrd to prove tht D 2 is context-free. Theorem (Chomsky-Schutzenberger) For every PDA, M = (Q,Σ,Γ,δ,q 0,Z 0,F), there is regulr lnguge R nd two homomorphisms g,h such tht L(M) = h(g 1 (D 2 ) R).

176 176 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Observe tht Theorem 6.19 yields nother proof of the fct tht the lnguge ccepted PDA is context-free. Indeed, the context-free lnguges re closed under, homomorphisms, inverse homomorphisms, intersection with the regulr lnguges, nd D 2 is context-free. From the chrcteriztion of -trnsducers in terms of homomorphisms, inverse homomorphisms, nd intersection with regulr lnguges, we deduce tht every context-free lnguge is the imge of D 2 under some -trnsduction.

177 Chpter 7 A Survey of LR-Prsing Methods In this chpter, we give brief survey on LR-prsing methods. We begin with the definition of chrcteristic strings nd the construction of Knuth s LR(0)-chrcteristic utomton. Next, we describe the shift/reduce lgorithm. The need for lookhed sets is motivted by the resolution of conflicts. A unified method for computing FIRST, FOLLOW nd LALR(1) lookhed sets is presented. The method uses sme grph lgorithm Trverse which finds ll nodes rechble from given node nd computes the union of predefined sets ssigned to these nodes. Hence, the only difference between the vrious lgorithms for computing FIRST, FOLLOW nd LALR(1) lookhed sets lies in the fct tht the initil sets nd the grphs re computed in different wys. The method cn be viewed s n efficient wy for solving set of simultneously recursive equtions with set vribles. The method is inspired by DeRemer nd Pennello s method for computing LALR(1) lookhed sets. However, DeRemer nd Pennello use more sophisticted grph lgorithm for finding strongly connected components. We use slightly less efficient but simpler lgorithm ( depth-first serch). We conclude with brief presenttion of LR(1) prsers. 7.1 LR(0)-Chrcteristic Automt The purpose of LR-prsing, invented by D. Knuth in the mid sixties, is the following: Given context-free grmmr G, for ny terminl string w Σ, find out whether w belongs to the lnguge L(G) generted by G, nd if so, construct rightmost derivtion of w, in deterministic fshion. Of course, this is not possible for ll context-free grmmrs, but only for those tht correspond to lnguges tht cn be recognized by deterministic PDA (DPDA). Knuth s mjor discovery ws tht for certin type of grmmrs, the LR(k)- grmmrs, certin kind of DPDA could be constructed from the grmmr (shift/reduce prsers). The k in LR(k) refers to the mount of lookhed tht is necessry in order to proceed deterministiclly. It turns out tht k = 1 is sufficient, but even in this cse, Knuth construction produces very lrge DPDA s, nd his originl LR(1) method is not prcticl. Fortuntely, round 1969, Frnk DeRemer, in his MIT Ph.D. thesis, investigted prcticl restriction of Knuth s method, known s SLR(k), nd soon fter, the LALR(k) method ws 177

178 178 CHAPTER 7. A SURVEY OF LR-PARSING METHODS discovered. The SLR(k) nd the LALR(k) methods re both bsed on the construction of the LR(0)-chrcteristic utomton from grmmr G, nd we begin by explining this construction. The dditionl ingredient needed to obtin n SLR(k) or n LALR(k) prser from n LR(0) prser is the computtion of lookhed sets. In the SLR cse, the FOLLOW sets re needed, nd in the LALR cse, more sophisticted version of the FOLLOW sets is needed. We will consider the construction of these sets in the cse k = 1. We will discuss the shift/reduce lgorithm nd consider briefly wys of building LR(1)-prsing tbles. For simplicity of exposition, we first ssume tht grmmrs hve no ǫ-rules. This restriction will be lifted in Section Given reduced context-free grmmr G = (V,Σ,P,S ) ugmented with strt production S S, where S does not pper in ny other productions, the set C G of chrcteristic strings of G is the following subset of V (wtch out, not Σ ): C G = {αβ V S = αbv = αβv, rm rm α,β V,v Σ,B β P}. In words, C G is certin set of prefixes of sententil forms obtined in rightmost derivtions: Those obtined by truncting the prt of the sententil form immeditely following the rightmost symbol in the righthnd side of the production pplied t the lst step. The fundmentl property of LR-prsing, due to D. Knuth, is tht C G is regulr lnguge. Furthermore, DFA, DCG, ccepting C G, cn be constructed from G. Conceptully, it is simpler to construct the DFA ccepting C G in two steps: (1) First, construct nondeterministic utomton with ǫ-rules, NCG, ccepting C G. (2) Apply the subset construction (Rbin nd Scott s method) to NCG to obtin the DFA DCG. In fct, creful inspection of the two steps of this construction revels tht it is possible to construct DCG directly in single step, nd this is the construction usully found in most textbooks on prsing. The nondeterministic utomton NCG ccepting C G is defined s follows: The sttes of N CG re mrked productions, where mrked production is string of the form A α. β, where A αβ is production, nd. is symbol not in V clled the dot nd which cn pper nywhere within αβ. The strt stte is S. S, nd the trnsitions re defined s follows: () For every terminl Σ, if A α. β is mrked production, with α,β V, then there is trnsition on input from stte A α. β to stte A α. β obtined by shifting the dot. Such trnsition is shown in Figure 7.1.

179 7.1. LR(0)-CHARACTERISTIC AUTOMATA 179 A α. β A α. β Figure 7.1: Trnsition on terminl input A α. Bβ B ǫ ǫ A αb. β B. γ 1 B. γ m Figure 7.2: Trnsitions from stte A α. Bβ (b) For every nonterminl B N, if A α. Bβ is mrked production, with α,β V, then there is trnsition on input B from stte A α. Bβ to stte A αb. β (obtined by shifting the dot ), nd trnsitions on input ǫ (the empty string) to ll sttes B. γ i, for ll productions B γ i with left-hnd side B. Such trnsitions re shown in Figure 7.2. (c) A stte is finl if nd only if it is of the form A β. (tht is, the dot is in the rightmost position). The bove construction is illustrted by the following exmple:

180 180 CHAPTER 7. A SURVEY OF LR-PARSING METHODS Exmple 1. Consider the grmmr G 1 given by: S E E Eb E b The NFA for C G1 is shown in Figure 7.3. The result of mking the NFA for C G1 deterministic is shown in Figure 7.4 (where trnsitions to the ded stte hve been omitted). The internl structure of the sttes 1,...,6 is shown below: 1 : S.E E.Eb E.b 2 : E.Eb E.b E.Eb E.b 3 : E E.b 4 : S E. 5 : E b. 6 : E Eb. The next exmple is slightly more complicted. Exmple 2. Consider the grmmr G 2 given by: S E E E +T E T T T T The result of mking the NFA for C G2 deterministic is shown in Figure 7.5 (where trnsitions to the ded stte hve been omitted). The internl structure of the sttes 1,...,8

181 7.1. LR(0)-CHARACTERISTIC AUTOMATA 181 S.E E ǫ ǫ S E. E.Eb E.b ǫ ǫ E.Eb E E.b b E b. E E.b b E Eb. Figure 7.3: NFA for C G E b b E Figure 7.4: DFA for C G1

182 182 CHAPTER 7. A SURVEY OF LR-PARSING METHODS T E + T 4 Figure 7.5: DFA for C G2 is shown below: 1 : S.E E.E +T E.T T.T T. 2 : E E.+T S E. 3 : E T. T T. 4 : T. 5 : E E +.T T.T T. 6 : T T. 7 : E E +T. T T. 8 : T T. Note tht some of the mrked productions re more importnt thn others. For exmple, in stte 5, the mrked production E E +.T determines the stte. The other two items T.T nd T. re obtined by ǫ-closure. We cll mrked production of the form A α.β, where α ǫ, core item. A mrked production of the form A β. is clled reduce item. Reduce items only pper in finl

183 7.1. LR(0)-CHARACTERISTIC AUTOMATA 183 sttes. If we lso cll S.S core item, we observe tht every stte is completely determined by its subset of core items. The other items in the stte re obtined vi ǫ-closure. We cn tke dvntge of this fct to write more efficient lgorithm to construct in single pss the LR(0)-utomton. Also observe the so-clled spelling property: All the trnsitions entering ny given stte hve the sme lbel. Given stte s, if s contins both reduce item A γ. nd shift item B α.β, where Σ, we sy tht there is shift/reduce conflict in stte s on input. If s contins two (distinct) reduce items A 1 γ 1. nd A 2 γ 2., we sy tht there is reduce/reduce conflict in stte s. A grmmr is sid to be LR(0) if the DFA DCG hs no conflicts. This is the cse for the grmmr G 1. However, it should be emphsized tht this is extremely rre in prctice. The grmmr G 1 is just very nice, nd toy exmple. In fct, G 2 is not LR(0). To eliminte conflicts, one cn either compute SLR(1)-lookhed sets, using FOLLOW sets (see Section 7.6), or shrper lookhed sets, the LALR(1) sets (see Section 7.9). For exmple, the computtion of SLR(1)-lookhed sets for G 2 will eliminte the conflicts. We will describe methods for computing SLR(1)-lookhed sets nd LALR(1)-lookhed sets in Sections 7.6, 7.9, nd A more drstic mesure is to compute the LR(1)- utomton, in which the sttes incoporte lookhed symbols (see Section 7.11). However, s we sid before, this is not prcticl methods for lrge grmmrs. In order to motivte the construction of shift/reduce prser from the DFA ccepting C G, let us consider rightmost derivtion for w = bbb in reverse order for the grmmr 0: S E 1: E Eb 2: E b bbb α 1 β 1 v 1 Ebb α 1 B 1 v 1 E b Ebb α 2 β 2 v 2 Eb α 2 B 2 v 2 E Eb Eb α 3 β 3 v 3 α 3 = v 3 = ǫ E α 3 B 3 v 3 α 3 = v 3 = ǫ E Eb E α 4 β 4 v 4 α 4 = v 4 = ǫ S α 4 B 4 v 4 α 4 = v 4 = ǫ S E

184 184 CHAPTER 7. A SURVEY OF LR-PARSING METHODS E b b E Figure 7.6: DFA for C G Observe tht the strings α i β i for i = 1,2,3,4 re ll ccepted by the DFA for C G shown in Figure 7.6. Also, every step from α i β i v i to α i B i v i is the inverse of the derivtion step using the production B i β i, nd the mrked production B i β i. is one of the reduce items in the finl stte reched fter processing α i β i with the DFA for C G. This suggests tht we cn prse w = bbb by recursively running the DFA for C G. The first time (which correspond to step 1) we run the DFA for C G on w, some string α 1 β 1 is ccepted nd the remining input in v 1. Then, we reduce β 1 to B 1 using production B 1 β 1 corresponding to some reduce item B 1 β 1. in the finl stte s 1 reched on input α 1 β 1. We now run the DFA for C G on input α 1 B 1 v 1. The string α 2 β 2 is ccepted, nd we hve α 1 B 1 v 1 = α 2 β 2 v 2. We reduce β 2 to B 2 using production B 2 β 2 corresponding to some reduce item B 2 β 2. in the finl stte s 2 reched on input α 2 β 2. We now run the DFA for C G on input α 2 B 2 v 2, nd so on. At the (i+1)th step (i 1), we run the DFA for C G on input α i B i v i. The string α i+1 β i+1 is ccepted, nd we hve α i B i v i = α i+1 β i+1 v i+1. We reduce β i+1 to B i+1 using production B i+1 β i+1 corresponding to some reduce item B i+1 β i+1. in the finl stte s i+1 reched on input α i+1 β i+1. The string β i+1 in α i+1 β i+1 v i+1 if often clled hndle. Then we run gin the DFA for C G on input α i+1 B i+1 v i+1. Now, becuse the DFA for C G is deterministic there is no need to rerun it on the entire string α i+1 B i+1 v i+1, becuse on input α i+1 it will tke us to the sme stte, sy p i+1, tht it reched on input α i+1 β i+1 v i+1!

185 7.1. LR(0)-CHARACTERISTIC AUTOMATA 185 The trick is tht we cn use stck to keep trck of the sequence of sttes used to process α i+1 β i+1. Then, to perform the reduction of α i+1 β i+1 to α i+1 B i+1, we simply pop number of sttes equl to β i+1, encovering new stte p i+1 on top of the stck, nd from stte p i+1 we perform the trnsition on input B i+1 to stte q i+1 (in the DFA for C G ), so we push stte q i+1 on the stck which now contins the sequence of sttes on input α i+1 B i+1 tht tkes us to q i+1. Then we resume scnning v i+1 using the DGA for C G, pushing ech stte being trversed on the stck until we hit finl stte. At this point we find the new string α i+2 β i+2 tht leds to finl stte nd we continue s before. The process stops when the remining input v i+1 becomes empty nd when the reduce item S S. (here, S E.) belongs to the finl stte s i E b b E Figure 7.7: DFA for C G For exmple, on input α 2 β 2 = Ebb, we hve the sequence of sttes: Stte 6 contins the mrked production E Eb., so we pop the three topmost sttes 236 obtining the stck 12 nd then we mke the trnsition from stte 2 on input E, which tkes us to stte 3, so we push 3 on top of the stck, obtining We continue from stte 3 on input b. 123 Bsiclly, the recursive clls to the DFA for C G re implemented using stck.

186 186 CHAPTER 7. A SURVEY OF LR-PARSING METHODS Wht is not cler is, during step i+1, when reching finl stte s i+1, how do we know which production B i+1 β i+1 to use in the reduction step? Indeed, stte s i+1 could contin severl reduce items B i+1 β i+1.. This is where we ssume tht we were ble to compute some lookhed informtion, tht is, for every finl stte s nd every input, we know which unique production n: B i+1 β i+1 pplies. This is recorded in tble nme ction, such tht ction(s,) = rn, where r stnds for reduce. Typiclly we compute SLR(1) or LALR(1) lookhed sets. Otherwise, we could pick some reducing production nondeterministicllly nd use bcktrcking. This works but the running time my be exponentil. The DFA for C G nd the ction tble giving us the reductions cn be combined to form bigger ction tble which specifies completely how the prser using stck works. This kind of prser clled shift-reduce prser is discussed in the next section. In order to mke it esier to compute the reduce entries in the prsing tble, we ssume tht the end of the input w is signlled by specil endmrker trditionlly denoted by $. 7.2 Shift/Reduce Prsers A shift/reduce prser is modified kind of DPDA. Firstly, push moves, clled shift moves, re restricted so tht exctly one symbol is pushed on top of the stck. Secondly, more powerful kinds of pop moves, clled reduce moves, re llowed. During reduce move, finite number of stck symbols my be popped off the stck, nd the lst step of reduce move, clled goto move, consists of pushing one symbol on top of new topmost symbol in the stck. Shift/reduce prsers use prsing tbles constructed from the LR(0)-chrcteristic utomton DCG ssocited with the grmmr. The shift nd goto moves come directly from the trnsition tble of DCG, but the determintion of the reduce moves requires the computtion of lookhed sets. The SLR(1) lookhed sets re obtined from some sets clled the FOLLOW sets (see Section 7.6), nd the LALR(1) lookhed sets LA(s,A γ) require fncier FOLLOW sets (see Section 7.9). The construction of shift/reduce prsers is mde simpler by ssuming tht the end of input strings w Σ is indicted by the presence of n endmrker, usully denoted $, nd ssumed not to belong to Σ. Consider the grmmr G 1 of Exmple 1, where we hve numbered the productions 0,1,2: 0 : S E 1 : E Eb 2 : E b The prsing tbles ssocited with the grmmr G 1 re shown below:

187 7.2. SHIFT/REDUCE PARSERS 187 b $ E 1 s2 4 2 s2 s5 3 3 s6 4 cc 5 r2 r2 r2 6 r1 r1 r E b b E Figure 7.8: DFA for C G Entries of the form si re shift ctions, where i denotes one of the sttes, nd entries of the form rn re reduce ctions, where n denotes production number (not stte). The specil ction cc mens ccept, nd signls the successful completion of the prse. Entries of the form i, in the rightmost column, re goto ctions. All blnk entries re error entries, nd men tht the prse should be borted. We will use the nottion ction(s,) for the entry corresponding to stte s nd terminl Σ {$}, nd goto(s,a) for the entry corresponding to stte s nd nonterminl A N {S }. Assuming tht the input is w$, we now describe in more detil how shift/reduce prser proceeds. The prser uses stck in which sttes re pushed nd popped. Initilly, the stck contins stte 1 nd the cursor pointing to the input is positioned on the leftmost symbol. There re four possibilities: (1) If ction(s,) = sj, then push stte j on top of the stck, nd dvnce to the next input symbol in w$. This is shift move. (2) If ction(s,) = rn, then do the following: First, determine the length k = γ of the righthnd side of the production n: A γ. Then, pop the topmost k symbols off the stck (if k = 0, no symbols re popped). If p is the new top stte on the stck (fter the k pop moves), push the stte goto(p,a) on top of the stck, where A is the

188 188 CHAPTER 7. A SURVEY OF LR-PARSING METHODS lefthnd side of the reducing production A γ. Do not dvnce the cursor in the current input. This is reduce move. (3) If ction(s,$) = cc, then ccept. The input string w belongs to L(G). (4) In ll other cses, error, bort the prse. The input string w does not belong to L(G). Observe tht no explicit stte control is needed. The current stte is lwys the current topmost stte in the stck. We illustrte below prse of the input bbb$. stck remining input ction 1 bbb$ s2 12 bbb$ s2 122 bbb$ s bbb$ s bb$ r bb$ s b$ r1 123 b$ s $ r1 14 $ cc Observe tht the sequence of reductions red from bottom-up yields rightmost derivtionofbbb frome (orfroms, if weview thectionccsthereductionby theproduction S E). This is generl property of LR-prsers. The SLR(1) reduce entries in the prsing tbles re determined s follows: For every stte s contining reduce item B γ., if B γ is the production number n, enter the ction rn for stte s nd every terminl FOLLOW(B). If the resulting shift/reduce prser hs no conflicts, we sy tht the grmmr is SLR(1). For the LALR(1) reduce entries, enter the ction rn for stte s nd production n: B γ, for ll LA(s,B γ). Similrly, if the shift/reduce prser obtined using LALR(1)-lookhed sets hs no conflicts, we sy tht the grmmr is LALR(1). 7.3 Computtion of FIRST In order to compute the FOLLOW sets, we first need to to compute the FIRST sets! For simplicity of exposition, we first ssume tht grmmrs hve no ǫ-rules. The generl cse will be treted in Section 7.10.

189 7.4. THE INTUITION BEHIND THE SHIFT/REDUCE ALGORITHM 189 Given context-free grmmr G = (V,Σ,P,S ) (ugmented with strt production S S), for every nonterminl A N = V Σ, let FIRST(A) = { Σ, A + = α, for some α V }. For terminl Σ, let FIRST() = {}. The key to the computtion of FIRST(A) is the following observtion: is in FIRST(A) if either is in or is in INITFIRST(A) = { Σ, A α P, for some α V }, { FIRST(B), A Bα P, for some α V, B A}. Note tht the second ssertion is true becuse, if B = + δ, then A = Bα = + δα, nd so, FIRST(B) FIRST(A) whenever A Bα P, with A B. Hence, the FIRST sets re the lest solution of the following set of recursive equtions: For ech nonterminl A, FIRST(A) = INITFIRST(A) {FIRST(B) A Bα P, A B}. In order to explin the method for solving such systems, we will formulte the problem in more generl terms, but first, we describe nive version of the shift/reduce lgorithm tht hopefully demystifies the optimized version described in Section The Intuition Behind the Shift/Reduce Algorithm Let DCG = (K,V,δ,q 0,F) be the DFA ccepting the regulr lnguge C G, nd let δ be the extension of δ to K V. Let us ssume tht the grmmr G is either SLR(1) or LALR(1), which implies tht it hs no shift/reduce or reduce/reduce conflicts. We cn use the DFA DCG ccepting C G recursively to prse L(G). The function CG is defined s follows: Given ny string µ V, begin error CG(µ) = if δ (q 0,µ) = error; (δ (q 0,θ),θ,v) if δ (q 0,θ) F, µ = θv nd θ is the shortest prefix of µ s.t. δ (q 0,θ) F. The nive shift-reduce lgorithm is shown below: ccept := true; stop := flse; µ := w$; {input string} while stop do if CG(µ) = error then

190 190 CHAPTER 7. A SURVEY OF LR-PARSING METHODS end else stop := true; ccept := flse Let (q,θ,v) = CG(µ) Let B β be the production so tht ction(q,first(v)) = B β nd let θ = αβ if B β = S S then else stop := true µ := αbv {reduction} endif endif endwhile The ide is to recursively run the DFA DCG on the sententil form µ, until the first finl stte q is hit. Then, the sententil form µ must be of the form αβv, where v is terminl string ending in $, nd the finl stte q contins reduce item of the form B β, with ction(q,first(v)) = B β. Thus, we cn reduce µ = αβv to αbv, since we hve found rightmost derivtion step, nd repet the process. Note tht the mjor inefficiency of the lgorithm is tht when reduction is performed, the prefix α of µ is reprsed entirely by DCG. Since DCG is deterministic, the sequence of sttes obtined on input α is uniquely determined. If we keep the sequence of sttes produced on input θ by DCG in stck, then it is possible to void reprsing α. Indeed, ll we hve to do is updte the stck so tht just before pplying DCG to αav, the sequence of sttes in the stck is the sequence obtined fter prsing α. This stck is obtined by popping the β topmost sttes nd performing n updte which is just goto move. This is the stndrd version of the shift/reduce lgorithm! 7.5 The Grph Method for Computing Fixed Points Let X be finite set representing the domin of the problem (in Section 7.3 bove, X = Σ), let F(1),...,F(N) be N sets to be computed nd let I(1),...,I(N) be N given subsets of X. The sets I(1),...,I(N) re the initil sets. We lso hve directed grph G whose set of nodes is {1,...,N} nd which represents reltionships mong the sets F(i), where 1 i N. The grph G hs no prllel edges nd no loops, but it my hve cycles. If there is n edge from i to j, this is denoted by igj (note tht the bsense of loops mens tht igi never holds). Also, the existence of pth from i to j is denoted by ig + j. The grph G represents reltion, nd G + is the grph of the trnsitive closure of this reltion. The existence of pth from i to j, including the null pth, is denoted by ig j. Hence, G is the

191 7.5. THE GRAPH METHOD FOR COMPUTING FIXED POINTS 191 reflexive nd trnsitive closure of G. We wnt to solve for the lest solution of the system of recursive equtions: F(i) = I(i) {F(j) igj, i j}, 1 i N. Since (2 X ) N is complete lttice under the inclusion ordering (which mens tht every fmily of subsets hs lest upper bound, nmely, the union of this fmily), it is n ω-complete poset, nd since the function F : (2 X ) N (2 X ) N induced by the system of equtions is esily seen to preserve lest upper bounds of ω-chins, the lest solution of the system cn be computed by the stndrd fixed point technique (s explined in Section 3.7 of the clss notes). We simply compute the sequence of pproximtions (F k (1),...,F k (N)), where F 0 (i) =, 1 i N, nd F k+1 (i) = I(i) {F k (j) igj, i j}, 1 i N. It is esily seen tht we cn stop t k = N 1, nd the lest solution is given by F(i) = F 1 (i) F 2 (i) F N (i), 1 i N. However, the bove expression cn be simplified to F(i) = {I(j) ig j}, 1 i N. This lst expression shows tht in order to compute F(i), it is necessry to compute the union of ll the initil sets I(j) rechble from i (including i). Hence, ny trnsitive closure lgorithm or grph trversl lgorithm will do. For simplicity nd for pedgogicl resons, we use depth-first serch lgorithm. Going bck to FIRST, we see tht ll we hve to do is to compute the INITFIRST sets, the grph GFIRST, nd then use the grph trversl lgorithm. The grph GFIRST is computed s follows: The nodes re the nonterminls nd there is n edge from A to B (A B) if nd only if there is production of the form A Bα, for some α V. Exmple 1. Computtion of the FIRST sets for the grmmr G 1 given by the rules: S E$ E E +T E T T T F T F F (E) F T F.

192 192 CHAPTER 7. A SURVEY OF LR-PARSING METHODS E T F Figure 7.9: Grph GFIRST for G 1 We get INITFIRST(E) =, INITFIRST(T) =, INITFIRST(F) = {(,, }. The grph GFIRST is shown in Figure 7.9. We obtin the following FIRST sets: FIRST(E) = FIRST(T) = FIRST(F) = {(,,}. 7.6 Computtion of FOLLOW Recll the definition of FOLLOW(A) for nonterminl A: FOLLOW(A) = { Σ, S + = αaβ, for some α,β V }. Note tht is in FOLLOW(A) if either is in or is in INITFOLLOW(A) = { Σ, B αaxβ P, FIRST(X), α,β V } { FOLLOW(B), B αa P, α V, A B}. Indeed, if S + = λbρ, then S + = λbρ = λαaρ, nd so, FOLLOW(B) FOLLOW(A) whenever B αa is in P, with A B. Hence, the FOLLOW sets re the lest solution of the set of recursive equtions: For ll nonterminls A, FOLLOW(A) = INITFOLLOW(A) {FOLLOW(B) B αa P, α V, A B}. According to the method explined bove, we just hve to compute the INITFOLLOW sets (using FIRST) nd the grph GFOLLOW, which is computed s follows: The nodes re the nonterminls nd there is n edge from A to B (A B) if nd only if there is production

193 7.7. ALGORITHM T RAV ERSE 193 E T F Figure 7.10: Grph GFOLLOW for G 1 of the form B αa in P, for some α V. Note the dulity between the construction of the grph GFIRST nd the grph GFOLLOW. Exmple 2. Computtion of the FOLLOW sets for the grmmr G 1. INITFOLLOW(E) = {+,),$}, INITFOLLOW(T) = { }, INITFOLLOW(F) =. The grph GFOLLOW is shown in Figure We hve FOLLOW(E) = INITFOLLOW(E), FOLLOW(T) = INITFOLLOW(T) INITFOLLOW(E) INITFOLLOW(F), FOLLOW(F) = INITFOLLOW(F) INITFOLLOW(T) INITFOLLOW(E), nd so FOLLOW(E) = {+,),$}, FOLLOW(T) = {+,,),$}, FOLLOW(F) = {+,,),$}. 7.7 Algorithm T rverse The input is directed grph Gr hving N nodes, nd fmily of initil sets I[i], 1 i N. Wessumethtfunctionsuccessorsisvilble, whichreturnsforechnodeninthegrph, the list successors[n] of ll immedite successors of n. The output is the list of sets F[i], 1 i N, solution of the system of recursive equtions of Section 7.5. Hence, F[i] = {I[j] ig j}, 1 i N. The procedure Rechble visits ll nodes rechble from given node. It uses stck STACK nd boolen rry VISITED to keep trck of which nodes hve been visited. The procedures Rechble nd trverse re shown in Figure 7.11.

194 194 CHAPTER 7. A SURVEY OF LR-PARSING METHODS Procedure Rechble(Gr : grph; strtnode : node; I : listof sets; vrf : listof sets); vr currentnode,succnode,i : node;stack : stck; VISITED : rry[1..n]of boolen; begin for i := 1 to N do VISITED[i] := flse; STACK := EMPTY; push(st ACK, strtnode); while STACK EMPTY do begin currentnode := top(st ACK); pop(st ACK); V ISIT ED[currentnode] := true; for ech succnode successors(currentnode) do if V ISIT ED[succnode] then begin push(st ACK, succnode); F[strtnode] := F[strtnode] I[succnode] end end end The sets F[i], 1 i N, re computed s follows: begin for i := 1 to N do F[i] := I[i]; for strtnode := 1 to N do Rechble(Gr, strtnode, I, F) end Figure 7.11: Algorithm trverse

195 7.8. MORE ON LR(0)-CHARACTERISTIC AUTOMATA More on LR(0)-Chrcteristic Automt Let G = (V,Σ,P,S ) be n ugmented context-free grmmr with ugmented strt productions S$(where S onlyoccursintheugmented production). The righmost derivtion reltion is denoted by = rm. Recll tht the set C G of chrcteristic strings for the grmmr G is defined by C G = {αβ V S = rm αav = rm αβv, αβ V, v Σ }. The fundmentl property of LR-prsing, due to D. Knuth, is stted in the following theorem: Theorem 7.1. Let G be context-free grmmr nd ssume tht every nonterminl derives some terminl string. The lnguge C G (over V ) is regulr lnguge. Furthermore, deterministic utomton DCG ccepting C G cn be constructed from G. The construction of DCG cn be found in vrious plces, including the book on Compilers by Aho, Sethi nd Ullmn. We explined this construction in Section 7.1. The proof tht the NFANCGconstructedsindictedinSection7.1iscorrect, i.e., thtitcceptsprecisely C G, is nontrivil, but not relly hrd either. This will be the object of homework ssignment! However, note subtle point: The construction of NCG is only correct under the ssumption tht every nonterminl derives some terminl string. Otherwise, the construction could yield n NFA NCG ccepting strings not in C G. Recll tht the sttes of the chrcteristic utomton CGA re sets of items (or mrked productions), where n item is production with dot nywhere in its right-hnd side. Note tht in constructing CGA, it is not necessry to include the stte {S S$.} (the endmrker $ is only needed to compute the lookhed sets). If stte p contins mrked production of the form A β., where the dot is the rightmost symbol, stte p is clled reduce stte nd A β is clled reducing production for p. Given ny stte q, we sy tht string β V ccesses q if there is pth from some stte p to the stte q on input β in the utomton CGA. Given ny two sttes p,q CGA, for ny β V, if there is sequence of trnsitions in CGA from p to q on input β, this is denoted by p β q. The initil stte which is the closure of the item S.S$ is denoted by 1. The LALR(1)- lookhed sets re defined in the next section. 7.9 LALR(1)-Lookhed Sets For ny reduce stte q nd ny reducing production A β for q, let LA(q,A β) = { Σ, S = rm αav = rm αβv, α,β V, v Σ, αβ ccessesq}.

196 196 CHAPTER 7. A SURVEY OF LR-PARSING METHODS In words, LA(q,A β) consists of the terminl symbols for which the reduction by production A β in stteq is the correct ction(tht is, for which theprse will terminte successfully). The LA sets cn be computed using the FOLLOW sets defined below. For ny stte p nd ny nonterminl A, let FOLLOW(p,A) = { Σ, S = rm αav, α V, v Σ nd α ccesses p}. Since for ny derivtion S = rm αav = rm αβv where αβ ccesses q, there is stte p such tht p β q nd α ccesses p, it is esy to see tht the following result holds: Proposition 7.2. For every reduce stte q nd ny reducing production A β for q, we hve LA(q,A β) = {FOLLOW(p,A) p β q}. Also, we let LA({S S.$},S S$) = FOLLOW(1,S). Intuitively, when the prser mkes the reduction by production A β in stte q, ech stte p s bove is possible top of stck fter the sttes corresponding to β re popped. Then the prser must red Ain stte p, nd the next input symbol will be one of the symbols in FOLLOW(p, A). The computtion of FOLLOW(p, A) is similr to tht of FOLLOW(A). First, we compute INITFOLLOW(p, A), given by INITFOLLOW(p,A) = { Σ, q,r, p A q r}. These re the terminls tht cn be red in CGA fter the goto trnsition on nonterminl A hs been performed from p. These sets cn be esily computed from CGA. Note tht for the stte p whose core item is S S.$, we hve INITFOLLOW(p, S) = {$}. Next, observe tht if B αa is production nd if S = rm λbv where λ ccesses p, then S = rm λbv = rm λαav

197 7.10. COMPUTING FIRST, FOLLOW, ETC. IN THE PRESENCE OF ǫ-rules 197 where λ ccesses p nd p α p. Hence λα ccesses p nd FOLLOW(p,B) FOLLOW(p,A) whenever there is production B αa nd p α p. From this, the following recursive equtions re esily obtined: For ll p nd ll A, FOLLOW(p, A) = INITFOLLOW(p, A) {FOLLOW(p,B) B αa P, α V nd p α p}. From Section 7.5, we know tht these sets cn be computed by using the lgorithm trverse. All we need is to compute the grph GLA. ThenodesofthegrphGLArethepirs(p,A),wherepissttendAisnonterminl. There is n edge from (p,a) to (p,b) if nd only if there is production of the form B αa in P for some α V nd p α p in CGA. Note tht it is only necessry to consider nodes (p,a) for which there is nonterminl trnsition on A from p. Such pirs cn be obtined from the prsing tble. Also, using the spelling property, tht is, the fct tht ll trnsitions entering given stte hve the sme lbel, it is possible to compute the reltion lookbck defined s follows: (q,a)lookbck (p,a) iff p β q for some reduce stte q nd reducing production A β. The bove considertions show tht the FOLLOW sets of Section 7.6 re obtined by ignoring the stte component from FOLLOW(p,A). Wenowconsiderthechngesththvetobemdewhenǫ-rulesrellowed Computing FIRST, FOLLOW, etc. in the Presence of ǫ-rules [Computing FIRST, FOLLOW nd LA(q,A β) in the Presence of ǫ-rules] First, it is necessry to compute the set E of ersble nonterminls, tht is, the set of nonterminls A such tht A = + ǫ. We let E be boolen rry nd chnge be boolen flg. An lgorithm for computing E is shown in Figure Then, in order to compute FIRST, we compute INITFIRST(A) = { Σ, A α P, or A A 1 A k α P, for some α V, nd E(A 1 ) = = E(A k ) = true}. The grph GFIRST is obtined s follows: The nodes re the nonterminls, nd there is n edge from A to B if nd only if either there is production A Bα, or production A A 1 A k Bα, for some α V, with E(A 1 ) = = E(A k ) = true. Then, we extend

198 198 CHAPTER 7. A SURVEY OF LR-PARSING METHODS begin for ech nonterminl A do E(A) := flse; for ech nonterminl A such tht A ǫ P do E(A) := true; chnge := true; while chnge do begin chnge := flse; for ech A A 1 A n P s.t. E(A 1 ) = = E(A n ) = true do if E(A) = flse then begin E(A) := true; chnge := true end end end Figure 7.12: Algorithm for computing E

199 7.10. COMPUTING FIRST, FOLLOW, ETC. IN THE PRESENCE OF ǫ-rules 199 FIRST to strings in V +, in the obvious wy. Given ny string α V +, if α = 1, then β = X for some X V, nd FIRST(β) = FIRST(X) s before, else if β = X 1 X n with n 2 nd X i V, then FIRST(β) = FIRST(X 1 ) FIRST(X k ), where k, 1 k n, is the lrgest integer so tht To compute FOLLOW, we first compute E(X 1 ) = = E(X k ) = true. INITFOLLOW(A) = { Σ, B αaβ P, α V, β V +, nd FIRST(β)}. The grph GFOLLOW is computed s follows: The nodes re the nonterminls. There is n edgefromatob ifeitherthereisproductionoftheformb αa, orb αaa 1 A k, for some α V, nd with E(A 1 ) = = E(A k ) = true. The computtion of the LALR(1) lookhed sets is lso more complicted becuse nother grph is needed in order to compute INITFOLLOW(p,A). First, the grph GLA is defined in the following wy: The nodes re still the pirs (p,a), s before, but there is n edge from (p,a) to (p,b) if nd only if either there is some production B αa, for some α V nd p α p, or production B αaβ, for some α V, β V +, β = + ǫ, nd p α p. The sets INITFOLLOW(p,A) re computed in the following wy: First, let DR(p,A) = { Σ, q,r, p A q r}. The sets DR(p,A) re the direct red sets. Note tht for the stte p whose core item is S S.$, we hve DR(p,S) = {$}. Then, INITFOLLOW(p,A) = DR(p,A) { Σ, S = αaβv = αav, α V, β V +, β = + ǫ, αccessesp}. rm rm The set INITFOLLOW(p,A) is the set of terminls tht cn be red before ny hndle contining A is reduced. The grph GREAD is defined s follows: The nodes re the pirs (p,a), nd there is n edge from (p,a) to (r,c) if nd only if p A r nd r C s, for some s, with E(C) = true. Then, it is not difficult to show tht the INITFOLLOW sets re the lest solution of the set of recursive equtions: INITFOLLOW(p,A) = DR(p,A) {INITFOLLOW(r,C) (p,a)gread (r,c)}.

200 200 CHAPTER 7. A SURVEY OF LR-PARSING METHODS Hence the INITFOLLOW sets cn be computed using the lgorithm trverse on the grph GREAD ndthesetsdr(p,a),ndthen, thefollowsetscnbecomputedusingtrverse gin, with the grph GLA nd sets INITFOLLOW. Finlly, the sets LA(q,A β) re computed from the FOLLOW sets using the grph lookbck. From section 7.5, we note tht F(i) = F(j) whenever there is pth from i to j nd pth from j to i, tht is, whenever i nd j re strongly connected. Hence, the solution of the system of recursive equtions cn be computed more efficiently by finding the mximl strongly connected components of the grph G, since F hs sme vlue on ech strongly connected component. This is the pproch followed by DeRemer nd Pennello in: Efficient Computtion of LALR(1) Lookhed sets, by F. DeRemer nd T. Pennello, TOPLAS, Vol. 4, No. 4, October 1982, pp We now give n exmple of grmmr which is LALR(1) but not SLR(1). Exmple 3. The grmmr G 2 is given by: S S$ S L = R S R L R L id R L

201 7.10. COMPUTING FIRST, FOLLOW, ETC. IN THE PRESENCE OF ǫ-rules 201 The sttes of the chrcteristic utomton CGA 2 re: We find tht 1 : S.S$ S.L = R S.R L. R L.id R.L 2 : S S.$ 3 : S L. = R R L. 4 : S R. 5 : L.R R.L L. R L.id 6 : L id. 7 : S L =.R R.L L. R L.id 8 : L R. 9 : R L. 10 : S L = R. INITFIRST(S) = INITFIRST(L) = {, id} INITFIRST(R) =. The grph GFIRST is shown in Figure Then, we find tht FIRST(S) = {,id} FIRST(L) = {,id} FIRST(R) = {, id}.

202 202 CHAPTER 7. A SURVEY OF LR-PARSING METHODS S L R Figure 7.13: The grph GFIRST S L R Figure 7.14: The grph GFOLLOW We lso hve INITFOLLOW(S) = {$} INITFOLLOW(L) = {=} INITFOLLOW(R) =. The grph GFOLLOW is shown in Figure Then, we find tht FOLLOW(S) = {$} FOLLOW(L) = {=,$} FOLLOW(R) = {=, $}. Note tht there is shift/reduce conflict in stte 3 on input =, since there is shift on input = (since S L. = R is in stte 3), nd reduce for R L, since = is in FOLLOW(R). However, s we shll see, the conflict is resolved if the LALR(1) lookhed sets re computed. The grph GLA is shown in Figure 7.15.

203 7.10. COMPUTING FIRST, FOLLOW, ETC. IN THE PRESENCE OF ǫ-rules 203 (1,S) (1,R) (5,R) (7,R) (1,L) (5,L) (7,L) Figure 7.15: The grph GLA We get the following INITFOLLOW nd FOLLOW sets: INITFOLLOW(1, S) = {$} INITFOLLOW(1, S) = {$} INITFOLLOW(1, R) = INITFOLLOW(1, R) = {$} INITFOLLOW(1,L) = {=} INITFOLLOW(1,L) = {=,$} INITFOLLOW(5, R) = INITFOLLOW(5, R) = {=, $} INITFOLLOW(5, L) = INITFOLLOW(5, L) = {=, $} INITFOLLOW(7, R) = INITFOLLOW(7, R) = {$} INITFOLLOW(7, L) = INITFOLLOW(7, L) = {$}. Thus, we get LA(2,S S$) = FOLLOW(1,S) = {$} LA(3,R L) = FOLLOW(1,R) = {$} LA(4,S R) = FOLLOW(1,S) = {$} LA(6,L id) = FOLLOW(1,L) FOLLOW(5,L) FOLLOW(7,L) = {=,$} LA(8,L R) = FOLLOW(1,L) FOLLOW(5,L) FOLLOW(7,L) = {=,$} LA(9,R L) = FOLLOW(5,R) FOLLOW(7,R) = {=,$} LA(10,S L = R) = FOLLOW(1,S) = {$}. Since LA(3,R L) does not contin =, the conflict is resolved.

204 204 CHAPTER 7. A SURVEY OF LR-PARSING METHODS (A α.β,b) (A α.β,b) Figure 7.16: Trnsition on terminl input 7.11 LR(1)-Chrcteristic Automt We conclude this brief survey on LR-prsing by describing the construction of LR(1)-prsers. The new ingredient is tht when we construct n NFA ccepting C G, we incorporte lookhed symbols into the sttes. Thus, stte is pir (A α.β,b), where A α.β is mrked production, s before, nd b Σ {$} is lookhed symbol. The new twist in the construction of the nondeterministic chrcteristic utomton is the following: The strt stte is (S.S,$), nd the trnsitions re defined s follows: () Forevery terminl Σ, thenthere istrnsitiononinput fromstte(a α.β,b) to the stte (A α.β,b) obtined by shifting the dot (where = b is possible). Such trnsition is shown in Figure (b) ForeverynonterminlB N, thereistrnsitiononinputb fromstte(a α.bβ,b) to stte (A αb.β,b) (obtined by shifting the dot ), nd trnsitions on input ǫ (the empty string) to ll sttes (B.γ,), for ll productions B γ with left-hnd side B nd ll FIRST(βb). Such trnsitions re shown in Figure (c) A stte is finl if nd only if it is of the form (A β.,b) (tht is, the dot is in the rightmost position). Exmple 3. Consider the grmmr G 3 given by: 0: S E 1: E Eb 2: E ǫ

205 7.11. LR(1)-CHARACTERISTIC AUTOMATA 205 (A α.bβ,b) B ǫ ǫ ǫ (A αb.β,b) (B.γ,) Figure 7.17: Trnsitions from stte (A α.bβ,b) The result of mking the NFA for C G3 deterministic is shown in Figure 7.18 (where trnsitions to the ded stte hve been omitted). The internl structure of the sttes 1,...,8 is shown below: 1 : S.E,$ E.Eb,$ E.,$ 2 : E.Eb,$ E.Eb,b E.,b 3 : E.Eb,b E.Eb,b E.,b 4 : E E.b,$ 5 : E Eb.,$ 6 : E E.b,b 7 : E Eb.,b 8 : S E.,$ The LR(1)-shift/reduce prser ssocited with DCG is built s follows: The shift nd goto entries come directly from the trnsitions of DCG, nd for every stte s, for every item

206 206 CHAPTER 7. A SURVEY OF LR-PARSING METHODS 5 7 b b E E E Figure 7.18: DFA for C G3 (A γ,b) in s, enter n entry rn for stte s nd input b, where A γ is production number n. If the resulting prser hs no conflicts, we sy tht the grmmr is n LR(1) grmmr. The LR(1)-shift/reduce prser for G 3 is shown below: b $ E 1 s2 r2 8 2 s3 r2 4 3 s3 r2 6 4 r5 5 r1 6 r1 s7 7 r1 8 cc Observe tht there re three pirs of sttes, (2,3), (4,6), nd (5,7), where both sttes in common pir only differ by the lookhed symbols. We cn merge the sttes corresponding to ech pir, becuse the mrked items re the sme, but now, we hve to llow lookhed sets. Thus, the merging of (2,3) yields 2 : E.Eb,{b,$} E.Eb,{b} E.,{b}, the merging of (4,6) yields 3 : E E.b,{b,$},

207 7.11. LR(1)-CHARACTERISTIC AUTOMATA 207 the merging of (5,7) yields 4 : E Eb.,{b,$}. We obtin merged DFA with only five sttes, nd the corresponding shift/reduce prser is given below: b $ E 1 s2 r2 8 2 s2 r2 3 3 s4 4 r1 r1 8 cc The reder should verify tht this is the LALR(1)-prser. The reder should lso check tht tht the SLR(1)-prser is given below: b $ E 1 s2 r2 r2 5 2 s2 r2 r2 3 3 s4 4 r1 r1 5 cc The difference between the two prsing tbles is tht the LALR(1)-lookhed sets re shrper thn the SLR(1)-lookhed sets. This is becuse the computtion of the LALR(1)- lookhed sets uses shrper version of FOLLOW sets. It cn lso be shown tht if grmmr is LALR(1), then the merging of sttes of n LR(1)-prser lwys succeeds nd yields the LALR(1) prser. Of course, this is very inefficient wy of producing LALR(1) prsers, nd much better methods exist, such s the grph method described in these notes. However, there re cses where the merging fils. Sufficient conditions for successful merging hve been investigted, but there is still room for reserch in this re.

208 208 CHAPTER 7. A SURVEY OF LR-PARSING METHODS

209 Chpter 8 RAM Progrms, Turing Mchines, nd the Prtil Recursive Functions See the scnned version of this chpter found in the web pge for CIS511: Prtil Functions nd RAM Progrms We define n bstrct mchine model for computing functions f: Σ } Σ {{} Σ, n where Σ = { 1,..., k } s some input lphbet. Numericl functions f: N n N cn be viewed s functions defined over the one-letter lphbet { 1 }, using the bijection m m 1. Let us recll the definition of prtil function. Definition 8.1. A binry reltion R A B between two sets A nd B is functionl iff, for ll x A nd y,z B, (x,y) R nd (x,z) R implies tht y = z. A prtil function is triple f = A,G,B, where A nd B re rbitrry sets (possibly empty) nd G is functionl reltion (possibly empty) between A nd B, clled the grph of f. 209

210 210 CHAPTER 8. RAM PROGRAMS, TURING MACHINES Hence, prtil function is functionl reltion such tht every rgument hs t most one imge under f. The grph of function f is denoted s grph(f). When no confusion cn rise, function f nd its grph re usully identified. A prtil function f = A,G,B is often denoted s f: A B. The domin dom(f) of prtil function f = A,G,B is the set dom(f) = {x A y B, (x,y) G}. For every element x dom(f), the unique element y B such tht (x,y) grph(f) is denoted s f(x). We sy tht f(x) converges, lso denoted s f(x). If x A nd x / dom(f), we sy tht f(x) diverges, lso denoted s f(x). Intuitively, if function is prtil, it does not return ny output for ny input not in its domin. This corresponds to n infinite computtion. A prtil function f: A B is totl function iff dom(f) = A. It is customry to cll totl function simply function. We now define model of computtion know s the RAM progrms, or Post mchines. RAM progrms re written in sort of ssembly lnguge involving simple instructions mnipulting strings stored into registers. Every RAM progrm uses fixed nd finite number of registers denoted s R1,...,Rp, with no limittion on the size of strings held in the registers. RAM progrms cn be defined either in flowchrt form or in liner form. Since the liner form is more convenient for coding purposes, we present RAM progrms in liner form. A RAM progrm P (in liner form) consists of finite sequence of instructions using finite number of registers R1,...,Rp. Instructions my optionlly be lbeled with line numbers denoted s N1,...,Nq. It is neither mndtory to lbel ll instructions, nor to use distinct line numbers! Thus, the sme line number cn be used in more thn one line. As we will see lter on, this mkes it esier to conctente two different progrms without performing renumbering of line numbers. Every instruction hs four fields, not necessrily ll used. The min field is the op-code. Here is n exmple of RAM progrm to conctente two strings x 1 nd x 2.

211 8.1. PARTIAL FUNCTIONS AND RAM PROGRAMS 211 R3 R1 R4 R2 N0 R4 jmp N1b R4 jmp b N2b jmp N3b N1 dd R3 til jmp R4 N0 N2 dd b R3 til jmp R4 N0 N3 R1 R3 continue Definition 8.2. RAM progrms re constructed from seven types of instructions shown below: (1 j ) N dd j Y (2) N til Y (3) N clr Y (4) N Y X (5) N jmp N1 (5b) N jmp N1b (6 j ) N Y jmp j N1 (6 j b) N Y jmp j N1b (7) N continue 1. An instruction of type (1 j ) conctentes the letter j to the right of the string held by register Y (1 j k). The effect is the ssignment Y := Y j. 2. An instruction of type (2) deletes the leftmost letter of the string held by the register Y. This corresponds to the function til, defined such tht

212 212 CHAPTER 8. RAM PROGRAMS, TURING MACHINES til(ǫ) = ǫ, til( j u) = u. The effect is the ssignment Y := til(y). 3. An instruction of type (3) clers register Y, i.e., sets its vlue to the empty string ǫ. The effect is the ssignment Y := ǫ. 4. An instruction of type (4) ssigns the vlue of register X to register Y. The effect is the ssignment Y := X. 5. An instruction of type (5) or (5b) is n unconditionl jump. The effect of (5) is to jump to the closest line number N1 occurring bove the instruction being executed, nd the effect of (5b) is to jump to the closest line number N1 occurring below the instruction being executed. 6. An instruction of type (6 j ) or (6 j b) is conditionl jump. Let hed be the function defined s follows: hed(ǫ) = ǫ, hed( j u) = j. The effect of (6 j ) is to jump to the closest line number N1 occurring bove the instruction being executed iff hed(y) = j, else to execute the next instruction (the one immeditely following the instruction being executed). The effect of (6 j b) is to jump to the closest line number N1 occurring below the instruction being executed iff hed(y) = j, else to execute the next instruction. When computing over N, instructions of type (6 j b) jump to the closest N1 bove or below iff Y is nonnull.

213 8.1. PARTIAL FUNCTIONS AND RAM PROGRAMS An instruction of type (7) is no-op, i.e., the registers re unffected. If there is next instruction, then it is executed, else, the progrm stops. Obviously, progrm is syntcticlly correct only if certin conditions hold. Definition 8.3. A RAM progrm P is finite sequence of instructions s in Definition 8.2, nd stisfying the following conditions: (1) For every jump instruction (conditionl or not), the line number to be jumped to must exist in P. (2) The lst instruction of RAM progrm is continue. The reson for llowing multiple occurences of line numbers is to mke it esier to conctente progrms without hving to perform renming of line numbers. The technicl choice of jumping to the closest ddress N1 bove or below comes from the fct tht it is esy to serch up or down using primitive recursion, s we will see lter on. For the purpose of computing function f: } Σ Σ {{} Σ using RAM progrm n P, we ssume tht P hs t lest n registers clled input registers, nd tht these registers R1,...,Rn re initilized with the input vlues of the function f. We lso ssume tht the output is returned in register R1. The following RAM progrm conctentes two strings x 1 nd x 2 held in registers R1 nd R2. R3 R1 R4 R2 N0 R4 jmp N1b R4 jmp b N2b jmp N3b N1 dd R3 til jmp R4 N0 N2 dd b R3 til jmp R4 N0 N3 R1 R3 continue

214 214 CHAPTER 8. RAM PROGRAMS, TURING MACHINES Since Σ = {,b}, for more clrity, we wrote jmp insted of jmp 1, jmp b insted of jmp 2, dd insted of dd 1, nd dd b insted of dd 2. Definition 8.4. A RAM progrm P computes the prtil function ϕ: (Σ ) n Σ if the following conditions hold: For every input (x 1,...,x n ) (Σ ) n, hving initilized the input registersr1,...,rnwithx 1,...,x n, theprogrmeventully hltsiffϕ(x 1,...,x n )converges, nd if nd when P hlts, the vlue of R1 is equl to ϕ(x 1,...,x n ). A prtil function ϕ is RAM-computble iff it is computed by some RAM progrm. For exmple, the following progrm computes the erse function E defined such tht for ll u Σ : E(u) = ǫ clr R1 continue The following progrm computes the jth successor function S j defined such tht for ll u Σ : S j (u) = u j dd j continue R1 The following progrm (with n input vribles) computes the projection function P n i defined such tht where n 1, nd 1 i n: P n i (u 1,...,u n ) = u i, R1 continue Ri Note tht P 1 1 is the identity function. Hving progrmming lnguge, we would like to know how powerful it is, tht is, we would like to know wht kind of functions re RAM-computble. At first glnce, RAM progrms don t do much, but this is not so. Indeed, we will see shortly tht the clss of RAM-computble functions is quite extensive.

215 8.2. DEFINITION OF A TURING MACHINE 215 One wy of getting new progrms from previous ones is vi composition. Another one is by primitive recursion. We will investigte these constructions fter introducing nother model of computtion, Turing mchines. Remrkbly, the clsses of(prtil) functions computed by RAM progrms nd by Turing mchines re identicl. This is the clss of prtil computble functions, lso clled prtil recursive functions, term which is now considered old-fshion. This clss cn be given severl other definitions. We will present the definition of the so-clled µ-recursive functions (due to Kleene). The following proposition will be needed to simplify the encoding of RAM progrms s numbers. Proposition 8.1. Every RAM progrm cn be converted to n equivlent progrm only using the following type of instructions: (1 j ) N dd j Y (2) N til Y (6 j ) N Y jmp j N1 (6 j b) N Y jmp j N1b (7) N continue The proof is firly simple. For exmple, instructions of the form Ri Rj cn be eliminted by trnsferring the contents of Rj into n uxiliry register Rk, nd then by trnsferring the contents of Rk into Ri nd Rj. 8.2 Definition of Turing Mchine We define Turing mchine model for computing functions f: Σ } Σ {{} Σ, n where Σ = { 1,..., N } is some input lphbet. We only consider deterministic Turing mchines. A Turing mchine lso uses tpe lphbet Γ such tht Σ Γ. The tpe lphbet contins some specil symbol B / Σ, the blnk.

216 216 CHAPTER 8. RAM PROGRAMS, TURING MACHINES In this model, Turing mchine uses single tpe. This tpe cn be viewed s string over Γ. The tpe is both n input tpe nd storge mechnism. Symbols on the tpe cn be overwritten, nd the tpe cn grow either on the left or on the right. There is red/write hed pointing to some symbol on the tpe. Definition 8.5. A (deterministic) Turing mchine (or TM) M is sextuple M = (K,Σ,Γ, {L,R},δ,q 0 ), where K is finite set of sttes; Σ is finite input lphbet; Γ is finite tpe lphbet, s.t. Σ Γ, K Γ =, nd with blnk B / Σ; q 0 K is the strt stte (or initil stte); δ is the trnsition function, (finite) set of quintuples δ K Γ Γ {L,R} K, such tht for ll (p,) K Γ, there is t most one triple (b,m,q) Γ {L,R} K such tht (p,,b,m,q) δ. A quintuple (p,,b,m,q) δ is clled n instruction. It is lso denoted s p, b,m,q. The effect of n instruction is to switch from stte p to stte q, overwrite the symbol currently scnned with b, nd move the red/write hed either left or right, ccording to m. Here is n exmple of Turing mchine. K = {q 0,q 1,q 2,q 3 }; Σ = {,b}; Γ = {,b,b}; The instructions in δ re:

217 8.3. COMPUTATIONS OF TURING MACHINES 217 q 0,B B,R,q 3, q 0, b,r,q 1, q 0,b,R,q 1, q 1, b,r,q 1, q 1,b,R,q 1, q 1,B B,L,q 2, q 2,,L,q 2, q 2,b b,l,q 2, q 2,B B,R,q Computtions of Turing Mchines To explin how Turing mchine works, we describe its ction on Instntneous descriptions. We tke dvntge of the fct tht K Γ = to define instntneous descriptions. Definition 8.6. Given Turing mchine M = (K,Σ,Γ,{L,R},δ,q 0 ), n instntneous description (for short n ID) is (nonempty) string in Γ KΓ +, tht is, string of the form upv, where u,v Γ, p K, nd Γ. The intuition is tht n ID upv describes snpshot of TM in the current stte p, whose tpe contins the string uv, nd with the red/write hed pointing to the symbol. Thus, in upv, the stte p is just to the left of the symbol presently scnned by the red/write hed. We explin how TM works by showing how it cts on ID s. Definition 8.7. Given Turing mchine M = (K,Σ,Γ,{L,R},δ,q 0 ), the yield reltion (or compute reltion) is binry reltion defined on the set of ID s s follows. For ny two ID s ID 1 nd ID 2, we hve ID 1 ID 2 iff either (1) (p,,b,r,q) δ, nd either

218 218 CHAPTER 8. RAM PROGRAMS, TURING MACHINES or () ID 1 = upcv, c Γ, nd ID 2 = ubqcv, or (b) ID 1 = up nd ID 2 = ubqb; (2) (p,,b,l,q) δ, nd either () ID 1 = ucpv, c Γ, nd ID 2 = uqcbv, or (b) ID 1 = pv nd ID 2 = qbbv. Note how the tpe is extended by one blnk fter the rightmost symbol in cse (1)(b), nd by one blnk before the leftmost symbol in cse (2)(b). As usul, we let + denote the trnsitive closure of, nd we let denote the reflexive nd trnsitive closure of. We cn now explin how Turing mchine computes prtil function f: Σ } Σ {{} Σ. n Since we llow functions tking n 1 input strings, we ssume tht Γ contins the specil delimiter, not in Σ, used to seprte the vrious input strings. It is convenient to ssume tht Turing mchine clens up its tpe when it hlts, before returning its output. For this, we will define proper ID s. Definition 8.8. Given Turing mchine M = (K,Σ,Γ,{L,R},δ,q 0 ), where Γ contins some delimiter, not in Σ in ddition to the blnk B, strting ID is of the form q 0 w 1,w 2,...,w n where w 1,...,w n Σ nd n 2, or q 0 w with w Σ +, or q 0 B. Ablocking (or hlting) ID isnidupv suchthtthererenoinstructions(p,,b,m,q) δ for ny (b,m,q) Γ {L,R} K. A proper ID is hlting ID of the form B k pwb l, where w Σ, nd k,l 0 (with l 1 when w = ǫ). Computtion sequences re defined s follows.

219 8.3. COMPUTATIONS OF TURING MACHINES 219 Definition 8.9. Given Turing mchine M = (K,Σ,Γ,{L,R},δ,q 0 ), computtion sequence (or computtion) is finite or infinite sequence of ID s ID 0,ID 1,...,ID i,id i+1,..., such tht ID i ID i+1 for ll i 0. A computtion sequence hlts iff it is finite sequence of ID s, so tht ID 0 ID n, nd ID n is hlting ID. A computtion sequence diverges if it is n infinite sequence of ID s. We now explin how Turing mchine computes prtil function. Definition A Turing mchine M = (K,Σ,Γ,{L,R},δ,q 0 ) computes the prtil function iff the following conditions hold: f: } Σ Σ {{} Σ n (1) For every w 1,...,w n Σ, given the strting ID ID 0 = q 0 w 1,w 2,...,w n or q 0 w with w Σ +, or q 0 B, the computtion sequence of M from ID 0 hlts in proper ID iff f(w 1,...,w n ) is defined. (2) If f(w 1,...,w n ) is defined, then M hlts in proper ID of the form ID n = B k pf(w 1,...,w n )B h, which mens tht it computes the right vlue. A function f (over Σ ) is Turing computble iff it is computed by some Turing mchine M.

220 220 CHAPTER 8. RAM PROGRAMS, TURING MACHINES Note tht by (1), the TM M my hlt in n improper ID, in which cse f(w 1,...,w n ) must be undefined. This corresponds to the fct tht we only ccept to retrieve the output of computtion if the TM hs clened up its tpe, i.e., produced proper ID. In prticulr, intermedite clcultions hve to be ersed before hlting. Exmple. K = {q 0,q 1,q 2,q 3 }; Σ = {,b}; Γ = {,b,b}; The instructions in δ re: q 0,B B,R,q 3, q 0, b,r,q 1, q 0,b,R,q 1, q 1, b,r,q 1, q 1,b,R,q 1, q 1,B B,L,q 2, q 2,,L,q 2, q 2,b b,l,q 2, q 2,B B,R,q 3. The reder cn esily verify tht this mchine exchnges the s nd b s in string. For exmple, on input w = bbb, the output is bbbb. 8.4 RAM-computble functions re Turing-computble Turing mchines cn simulte RAM progrms, nd s result, we hve the following Theorem. Theorem 8.2. Every RAM-computble function is Turing-computble. Furthermore, given RAM progrm P, we cn effectively construct Turing mchine M computing the sme function. The ide of the proof is to represent the contents of the registers R1,...Rp on the Turing mchine tpe by the string #r1#r2# #rp#,

221 8.5. TURING-COMPUTABLE FUNCTIONS ARE RAM-COMPUTABLE 221 Where # is specil mrker nd ri represents the string held by Ri, We lso use Proposition 8.1 to reduce the number of instructions to be delt with. The Turing mchine M is built of blocks, ech block simulting the effect of some instruction of the progrm P. The detils re bit tedious, nd cn be found in the notes or in Mchtey nd Young. 8.5 Turing-computble functions re RAM-computble RAM progrms cn lso simulte Turing mchines. Theorem 8.3. Every Turing-computble function is RAM-computble. Furthermore, given Turing mchine M, one cn effectively construct RAM progrm P computing the sme function. The ide of the proof is to design RAM progrm contining n encoding of the current ID of the Turing mchine M in register R1, nd to use other registers R2,R3 to simulte the effect of executing n instruction of M by updting the ID of M in R1. The detils re tedious nd cn be found in the notes. Another proof cn be obtined by proving tht the clss of Turing computble functions coincides with the clss of prtil computble functions (formerly clled prtil recursive functions). Indeed, it turns out tht both RAM progrms nd Turing mchines compute precisely the clss of prtil recursive functions. For this, we need to define the primitive recursive functions. Informlly, primitive recursive function is totl recursive function tht cn be computed using only for loops, tht is, loops in which the number of itertions is fixed (unlike while loop). A forml definition of the primitive functions is given in Section 8.7. Definition LetΣ = { 1,..., N }. Theclssofprtil computble functions lsoclled prtil recursive functions is the clss of prtil functions (over Σ ) tht cn be computed by RAM progrms (or equivlently by Turing mchines). The clss of computble functions lso clled recursive functions is the subset of the clss of prtil computble functions consisting of functions defined for every input (i.e., totl functions). We cn lso del with lnguges.

222 222 CHAPTER 8. RAM PROGRAMS, TURING MACHINES 8.6 Computbly Enumerble Lnguges nd Computble Lnguges We define the computbly enumerble lnguges, lso clled listble lnguges, nd the computble lnguges. The old-fshion terminology for computbly enumerble lnguges is recursively enumerble lnguges, nd for computble lnguges is recursive lnguges. We ssume tht the TM s under considertion hve tpe lphbet contining the specil symbols 0 nd 1. Definition Let Σ = { 1,..., N }. A lnguge L Σ is (Turing) computbly enumerble (for short, c.e. set), or (Turing) listble (or recursively enumerble (for short, r.e. set)) iff there is some TM M such tht for every w L, M hlts in proper ID with the output 1, nd for every w / L, either M hlts in proper ID with the output 0, or it runs forever. A lnguge L Σ is (Turing) computble (or recursive) iff there is some TM M such tht for every w L, M hlts in proper ID with the output 1, nd for every w / L, M hlts in proper ID with the output 0. Thus, given computbly enumerble lnguge L, for some w / L, it is possible tht TM ccepting L runs forever on input w. On the other hnd, for computble (recursive) lnguge L, TM ccepting L lwys hlts in proper ID. When deling with lnguges, it is often useful to consider nondeterministic Turing mchines. Such mchines re defined just like deterministic Turing mchines, except tht their trnsition function δ is just (finite) set of quintuples with no prticulr extr condition. δ K Γ Γ {L,R} K, It cn be shown tht every nondeterministic Turing mchine cn be simulted by deterministic Turing mchine, nd thus, nondeterministic Turing mchines lso ccept the clss of c.e. sets. It cn be shown tht computbly enumerble lnguge is the rnge of some computble (recursive) function. It cn lso be shown tht lnguge L is computble (recursive) iff both L nd its complement re computbly enumerble. There re computbly enumerble lnguges tht re not computble (recursive). Turing mchines were invented by Turing round The primitive recursive functions were known to Hilbert circ Gödel formlized their definition in The prtil recursive functions were defined by Kleene round 1934.

223 8.7. THE PRIMITIVE RECURSIVE FUNCTIONS 223 Church lso introduced the λ-clculus s model of computtion round Other models: Post systems, Mrkov systems. The equivlence of the vrious models of computtion ws shown round 1935/36. RAM progrms were only defined round 1963 (they re slight generliztion of Post system). A further study of the prtil recursive functions requires the notions of piring functions nd of universl functions (or universl Turing mchines). 8.7 The Primitive Recursive Functions The clss of primitive recursive functions is defined in terms of bse functions nd closure opertions. Definition Let Σ = { 1,..., N }. The bse functions over Σ re the following functions: (1) The erse function E, defined such tht E(w) = ǫ, for ll w Σ ; (2) For every j, 1 j N, the j-successor function S j, defined such tht S j (w) = w j, for ll w Σ ; (3) The projection functions Pi n, defined such tht P n i (w 1,...,w n ) = w i, for every n 1, every i, 1 i n, nd for ll w 1,...,w n Σ. Note tht P 1 1 is the identity function on Σ. Projection functions cn be used to permute the rguments of nother function. A crucil closure opertion is (extended) composition. Definition Let Σ = { 1,..., N }. For ny function g: Σ } Σ {{} Σ, m nd ny m functions h i : } Σ Σ {{} Σ, n the composition of g nd the h i is the function denoted s g (h 1,...,h m ), such tht for ll w 1,...,w n Σ. f: Σ } Σ {{} Σ, n f(w 1,...,w n ) = g(h 1 (w 1,...,w n ),...,h m (w 1,...,w n )),

224 224 CHAPTER 8. RAM PROGRAMS, TURING MACHINES As n exmple, f = g (P2 2,P2 1 ) is such tht f(w 1,w 2 ) = g(p 2 2 (w 1,w 2 ),P 2 1 (w 1,w 2 )) = g(w 2,w 1 ). Another crucil closure opertion is primitive recursion. Definition Let Σ = { 1,..., N }. For ny function where m 2, nd ny N functions g: Σ } Σ {{} Σ, m 1 h i : } Σ Σ {{} Σ, m+1 the function f: Σ } Σ {{} Σ, m is defined by primitive recursion from g nd h 1,...,h N, if f(ǫ,w 2,...,w m ) = g(w 2,...,w m ), f(u 1,w 2,...,w m ) = h 1 (u,f(u,w 2,...,w m ),w 2,...,w m ), = f(u N,w 2,...,w m ) = h N (u,f(u,w 2,...,w m ),w 2,...,w m ), for ll u,w 2,...,w m Σ. When m = 1, for some fixed w Σ, we hve for ll u Σ. f(ǫ) = w, f(u 1 ) = h 1 (u,f(u)), = f(u N ) = h N (u,f(u)), For numericl functions (i.e., when Σ = { 1 }), the scheme of primitive recursion is simpler: f(0,x 2,...,x m ) = g(x 2,...,x m ), f(x+1,x 2,...,x m ) = h 1 (x,f(x,x 2,...,x m ),x 2,...,x m ),

225 8.7. THE PRIMITIVE RECURSIVE FUNCTIONS 225 for ll x,x 2,...,x m N. The successor function S is the function S(x) = x+1. Addition, multipliction, exponentition, nd super-exponentition, cn be defined by primitive recursion s follows (being bit loose, we should use some projections...): dd(0,n) = P 1 1(n) = n, dd(m+1,n) = S P 3 2 (m,dd(m,n),n) = S(dd(m,n)) mult(0,n) = E(n) = 0, mult(m+1,n) = dd (P 3 2,P 3 3)(m,mult(m,n),n) = dd(mult(m,n),n), rexp(0,n) = S E(n) = 1, rexp(m+1,n) = mult(rexp(m,n),n), exp(m,n) = rexp (P 2 2,P2 1 )(m,n), supexp(0,n) = 1, supexp(m+1,n) = exp(n,supexp(m,n)). We usully write m + n for dd(m,n), m n or even mn for mult(m,n), nd m n for exp(m,n). by There is minus opertion on N nmed monus. This opertion denoted by. is defined m. n = { m n if m n 0 if m < n. To show tht it is primitive recursive, we define the function pred. Let pred be the primitive recursive function given by Then monus is defined by pred(0) = 0 pred(m+1) = P 2 1(m,pred(m)) = m. monus(m,0) = m monus(m,n+1) = pred(monus(m,n)), except tht the bove is not legl primitive recursion. It is left s n exercise to give proper primitive recursive definition of monus.

226 226 CHAPTER 8. RAM PROGRAMS, TURING MACHINES As n exmple over {,b}, the following function g: Σ Σ Σ, is defined by primitive recursion: g(ǫ,v) = P 1 1(v), g(u i,v) = S i P 3 2 (u,g(u,v),v), where 1 i N. It is esily verified tht g(u,v) = vu. Then, f = g (P 2 2,P2 1 ) computes the conctention function, i.e. f(u, v) = uv. The following functions re lso primitive recursive: { 1 if n > 0 sg(n) = 0 if n = 0, s well s nd nd Indeed Finlly, the function is primitive recursive since sg(n) = { 0 if n > 0 1 if n = 0, bs(m,n) = m m = m. n+n. m, eq(m,n) = sg(0) = 0 { 1 if m = n 0 if m n. sg(n+1) = S E P 2 1 (n,sg(n)), sg(n) = S(E(n)). sg(n) = 1. sg(n), eq(m,n) = sg( m n ). cond(m,n,p,q) = { p if m = n q if m n, cond(m,n,p,q) = eq(m,n) p+sg(eq(m,n)) q.

227 8.7. THE PRIMITIVE RECURSIVE FUNCTIONS 227 We cn lso design more generl version of cond. For exmple, define compre s { 1 if m n compre (m,n) = 0 if m > n, which is given by Then we cn define with compre (m,n) = 1. sg(m. n). cond (m,n,p,q) = { p if m n q if m > n, cond (m,n,n,p) = compre (m,n) p+sg(compre (m,n)) q. The bove llows to define functions by cses. Definition Let Σ = { 1,..., N }. The clss of primitive recursive functions is the smllest clss of functions (over Σ ) which contins the bse functions nd is closed under composition nd primitive recursion. We leve s n exercise to show tht every primitive recursive function is totl function. The clss of primitive recursive functions my not seem very big, but it contins ll the totl functions tht we would ever wnt to compute. Although it is rther tedious to prove, the following theorem cn be shown. Theorem 8.4. For n lphbet Σ = { 1,..., N }, every primitive recursive function is Turing computble. The best wy to prove the bove theorem is to use the computtion model of RAM progrms. Indeed, it ws shown in Theorem 8.2 tht every RAM progrm cn be converted to Turing mchine. It is lso rther esy to show tht the primitive recursive functions re RAM-computble. In order to define new functions it is lso useful to use predictes. Definition An n-ry predicte P (over Σ ) is ny subset of (Σ ) n. We write tht tuple (x 1,...,x n ) stisfies P s (x 1,...,x n ) P or s P(x 1,...,x n ). The chrcteristic function of predicte P is the function C P : (Σ ) n { 1 } defined by { 1 iff P(x C p (x 1,...,x n ) = 1,...,x n ) ǫ iff not P(x 1,...,x n ). A predicte P is primitive recursive iff its chrcteristic function C P is primitive recursive.

228 228 CHAPTER 8. RAM PROGRAMS, TURING MACHINES We leve to the reder the obvious dpttion of the the notion of primitive recursive predicte to functions defined over N. In this cse, 0 plys the role of ǫ nd 1 plys the role of 1. It is esily shown tht if P nd Q re primitive recursive predictes (over (Σ ) n ), then P Q, P Q nd P re lso primitive recursive. As n exercise, the reder my wnt to prove tht the predicte (defined over N): prime(n) iff n is prime number, is primitive recursive predicte. For ny fixed k 1, the function: ord(k,n) = exponent of the kth prime in the prime fctoriztion of n, is primitive recursive function. We cn lso define functions by cses. Proposition 8.5. If P 1,...,P n re pirwise disjoint primitive recursive predictes (which mens tht P i P j = for ll i j) nd f 1,...,f n+1 re primitive recursive functions, the function g defined below is lso primitive recursive: f 1 (x) iff P 1 (x) g(x) =. f n (x) iff P n (x) f n+1 (x) otherwise. (writing x for (x 1,...,x n ).) It is lso useful to hve bounded quntifiction nd bounded minimiztion. Definition If P is n (n + 1)-ry predicte, then the bounded existentil predicte y/xp(y,z) holds iff some prefix y of x mkes P(y,z) true. The bounded universl predicte y/xp(y,z) holds iff every prefix y of x mkes P(y,z) true. Proposition 8.6. If P is n (n+1)-ry primitive recursive predicte, then y/xp(y,z) nd y/xp(y, z) re lso primitive recursive predictes. As n ppliction, we cn show tht the equlity predicte, u = v?, is primitive recursive. Definition If P is n (n + 1)-ry predicte, then the bounded minimiztion of P, miny/xp(y,z), is the function defined such tht miny/xp(y,z) is the shortest prefix of x such tht P(y,z) if such y exists, x 1 otherwise. The bounded mximiztion of P, mx y/x P(y, z), is the function defined such tht mxy/xp(y,z) is the longest prefix of x such tht P(y,z) if such y exists, x 1 otherwise.

229 8.8. THE PARTIAL COMPUTABLE FUNCTIONS 229 Proposition 8.7. If P is n (n+1)-ry primitive recursive predicte, then miny/xp(y,z) nd mx y/x P(y, z) re primitive recursive functions. So fr, the primitive recursive functions do not yield ll the Turing-computble functions. In order to get lrger clss of functions, we need the closure opertion known s minimiztion. 8.8 The Prtil Computble Functions Minimiztion cn be viewed s n bstrct version of while loop. Let Σ = { 1,..., N }. For ny function g: Σ } Σ {{} Σ, m+1 where m 0, for every j, 1 j N, the function f: } Σ Σ {{} Σ m looks for the shortest string u over j (for given j) such tht u := ǫ; while g(u,w 1,...,w m ) ǫ do u := u j ; endwhile let f(w 1,...,w m ) = u g(u,w 1,...,w m ) = ǫ : The opertion of minimiztion (sometimes clled minimliztion) is defined s follows. Definition Let Σ = { 1,..., N }. For ny function g: Σ } Σ {{} Σ, m+1 where m 0, for every j, 1 j N, the function f: Σ } Σ {{} Σ, m is defined by minimiztion over { j } from g, if the following conditions hold for ll w 1,..., w m Σ :

230 230 CHAPTER 8. RAM PROGRAMS, TURING MACHINES (1) f(w 1,...,w m ) is defined iff there is some n 0 such tht g( p j,w 1,...,w m ) is defined for ll p, 0 p n, nd g( n j,w 1,...,w m ) = ǫ. (2) When f(w 1,...,w m ) is defined, f(w 1,...,w m ) = n j, where n is such tht nd for every p, 0 p n 1. g( n j,w 1,...,w m ) = ǫ g( p j,w 1,...,w m ) ǫ We lso write f(w 1,...,w m ) = min j u[g(u,w 1,...,w m ) = ǫ]. Note: When f(w 1,...,w m ) is defined, f(w 1,...,w m ) = n j, where n is the smllest integer such tht condition (1) holds. It is very importnt to require tht ll the vlues g( p j,w 1,...,w m ) be defined for ll p, 0 p n, when defining f(w 1,...,w m ). Filure to do so llows non-computble functions. Remrk: Kleene used the µ-nottion: ctully, its numericl form: f(w 1,...,w m ) = µ j u[g(u,w 1,...,w m ) = ǫ], f(x 1,...,x m ) = µx[g(x,x 1,...,x m ) = 0]. The clss of prtil computble functions is defined s follows. Definition LetΣ = { 1,..., N }. Theclssofprtil computble functions lsoclled prtil recursive functions is the smllest clss of prtil functions (over Σ ) which contins the bse functions nd is closed under composition, primitive recursion, nd minimiztion. The clss of computble functions lso clled recursive functions is the subset of the clss of prtil computble functions consisting of functions defined for every input (i.e., totl functions). One of the mjor results of computbility theory is the following theorem.

231 8.8. THE PARTIAL COMPUTABLE FUNCTIONS 231 Theorem 8.8. For n lphbet Σ = { 1,..., N }, every prtil computble function (prtil recursive function) is Turing-computble. Conversely, every Turing-computble function is prtil computble function (prtil recursive function). Similrly, the clss of computble functions (recursive functions) is equl to the clss of Turing-computble functions tht hlt in proper ID for every input. To prove tht every prtil computble function is indeed Turing-computble, since by Theorem 8.2, every RAM progrm cn be converted to Turing mchine, the simplest thing to do is to show tht every prtil computble function is RAM-computble. For the converse, one cn show tht given Turing mchine, there is primitive recursive function describing how to go from one ID to the next. Then, minimiztion is used to guess whether computtion hlts. The proof shows tht every prtil computble function needs minimiztion t most once. The chrcteriztion of the computble functions in terms of TM s follows esily. There re computble functions (recursive functions) tht re not primitive recursive. Such n exmple is given by Ackermnn s function. nd Ackermnn s function. This is function A: N N N which is defined by the following recursive cluses: A(0, y) = y +1, A(x+1, 0) = A(x, 1), A(x+1, y +1) = A(x, A(x+1,y)). It turns out tht A is computble function which is not primitive recursive. It cn be shown tht: with A(4,0) = 16 3 = 13. For exmple A(0, x) = x+1, A(1, x) = x+2, A(2, x) = 2x+3, A(3, x) = 2 x+3 3, A(4,x) = } x 3, A(4,1) = , A(4,2) =

232 232 CHAPTER 8. RAM PROGRAMS, TURING MACHINES Actully, it is not so obvious tht A is totl function. This cn be shown by induction, using the lexicogrphic ordering on N N, which is defined s follows: (m,n) (m,n ) iff either m = m nd n = n, or m < m, or m = m nd n < n. We write (m,n) (m,n ) when (m,n) (m,n ) nd (m,n) (m,n ). We prove tht A(m,n) is defined for ll (m,n) N N by complete induction over the lexicogrphic ordering on N N. In the bse cse, (m,n) = (0,0), nd since A(0,n) = n + 1, we hve A(0,0) = 1, nd A(0,0) is defined. For (m,n) (0,0), the induction hypothesis is tht A(m,n ) is defined for ll (m,n ) (m,n). We need to conclude tht A(m,n) is defined. If m = 0, since A(0,n) = n+1, A(0,n) is defined. If m 0 nd n = 0, since (m 1,1) (m,0), by the induction hypothesis, A(m 1,1) is defined, but A(m,0) = A(m 1,1), nd thus A(m,0) is defined. If m 0 nd n 0, since (m,n 1) (m,n), by the induction hypothesis, A(m,n 1) is defined. Since (m 1,A(m,n 1)) (m,n), by the induction hypothesis, A(m 1,A(m,n 1)) is defined. But A(m,n) = A(m 1,A(m,n 1)), nd thus A(m,n) is defined. Thus, A(m,n) is defined for ll (m,n) N N. It is possible to show tht A is recursive function, lthough the quickest wy to prove it requires some fncy mchinery (the recursion theorem). Proving tht A is not primitive recursive is hrder. The following proposition shows tht restricting ourselves to totl functions is too limiting. Let F be ny set of totl functions tht contins the bse functions nd is closed under composition nd primitive recursion (nd thus, F contins ll the primitive recursive functions).

233 8.8. THE PARTIAL COMPUTABLE FUNCTIONS 233 Definition We sy tht function f: Σ Σ Σ is universl for the one-rgument functions in F iff for every function g: Σ Σ in F, there is some n N such tht for ll u Σ. f( n 1,u) = g(u) Proposition 8.9. For ny countble set F of totl functions contining the bse functions nd closed under composition nd primitive recursion, if f is universl function for the functions g: Σ Σ in F, then f / F. Proof. Assume tht the universl function f is in F. Let g be the function such tht g(u) = f( u 1,u) 1 for ll u Σ. We clim tht g F. It it enough to prove tht the function h such tht h(u) = u 1 is primitive recursive, which is esily shown. Then, becuse f is universl, there is some m such tht for ll u Σ. Letting u = m 1, we get contrdiction. g(u) = f( m 1,u) g( m 1 ) = f( m 1, m 1 ) = f( m 1, m 1 ) 1, Thus, either universl function for F is prtil, or it is not in F.

234 234 CHAPTER 8. RAM PROGRAMS, TURING MACHINES

235 Chpter 9 Universl RAM Progrms nd Undecidbility of the Hlting Problem 9.1 Piring Functions Piring functions re used to encode pirs of integers into single integers, or more generlly, finite sequences of integers into single integers. We begin by exhibiting bijective piring function J: N 2 N. The function J hs the grph prtilly showed below: y 4 10 ց ց ց ց ց ց ց ց ց ց x The function J corresponds to certin wy of enumerting pirs of integers (x,y). Note tht the vlue of x + y is constnt long ech descending digonl, nd consequently, we hve J(x,y) = (x+y)+x, = ((x+y)(x+y +1)+2x)/2, = ((x+y) 2 +3x+y)/2, 235

236 236 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM tht is, J(x,y) = ((x+y) 2 +3x+y)/2. For exmple, J(0,3) = 6, J(1,2) = 7, J(2,2) = 12, J(3,1) = 13, J(4,0) = 14. Let K: N N nd L: N N be the projection functions onto the xes, tht is, the unique functions such tht K(J(,b)) = nd L(J(,b)) = b, for ll,b N. For exmple, K(11) = 1, nd L(11) = 3; K(12) = 2, nd L(12) = 2; K(13) = 3 nd L(13) = 1. The functions J, K, L re clled Cntor s piring functions. They were used by Cntor to prove tht the set Q of rtionl numbers is countble. Clerly, J is primitive recursive, since it is given by polynomil. It is not hrd to prove tht J is injective nd surjective, nd tht it is strictly monotonic in ech rgument, which mens tht for ll x,x,y,y N, if x < x then J(x,y) < J(x,y), nd if y < y then J(x,y) < J(x,y ). The projection functions cn be computed explicitly, lthough this is bit tricky. We only need to observe tht by monotonicity of J, x J(x,y) nd y J(x,y), nd thus, nd K(z) = min(x z)( y z)[j(x,y) = z], L(z) = min(y z)( x z)[j(x,y) = z]. Therefore, K nd L re primitive recursive. It cn be verified tht J(K(z),L(z)) = z, for ll z N. More explicit formule cn be given for K nd L. If we define then it cn be shown tht Q 1 (z) = ( 8z +1 +1)/2 1 Q 2 (z) = 2z (Q 1 (z)) 2, K(z) = 1 2 (Q 2(z) Q 1 (z)) L(z) = Q 1 (z) 1 2 (Q 2(z) Q 1 (z)). In the bove formul, the function m m yields the lrgest integer s such tht s 2 m. It cn be computed by RAM progrm.

237 9.1. PAIRING FUNCTIONS 237 The piring function J(x,y) is lso denoted s x,y, nd K nd L re lso denoted s Π 1 nd Π 2. By induction, we cn define bijections between N n nd N for ll n 1. We let z 1 = z, x 1,x 2 2 = x 1,x 2, nd For exmple. x 1,...,x n,x n+1 n+1 = x 1,...,x n 1, x n,x n+1 n. It cn be shown by induction on n tht x 1,x 2,x 3 3 = x 1, x 2,x 3 2 = x 1, x 2,x 3 x 1,x 2,x 3,x 4 4 = x 1,x 2, x 3,x 4 3 = x 1, x 2, x 3,x 4 x 1,x 2,x 3,x 4,x 5 5 = x 1,x 2,x 3, x 4,x 5 4 = x 1, x 2, x 3, x 4,x 5. x 1,...,x n,x n+1 n+1 = x 1, x 2,...,x n+1 n. The function,..., n : N n N is clled n extended piring function. Observe tht if z = x 1,...,x n n, then x 1 = Π 1 (z), x 2 = Π 1 (Π 2 (z)), x 3 = Π 1 (Π 2 (Π 2 (z)), x 4 = Π 1 (Π 2 (Π 2 (Π 2 (z)))), x 5 = Π 2 (Π 2 (Π 2 (Π 2 (z)))). We cn lso define uniform projection function Π with the following property: if z = x 1,...,x n, with n 2, then Π(i,n,z) = x i for ll i, where 1 i n. The ide is to view z s n-tuple, nd Π(i,n,z) s the i-th component of tht n-tuple. The function Π is defined by cses s follows: Π(i,0,z) = 0, for ll i 0, Π(i,1,z) = z, for ll i 0, Π(i,2,z) = Π 1 (z), if 0 i 1, Π(i,2,z) = Π 2 (z), for ll i 2, nd for ll n 2, Π(i,n,z) if 0 i < n, Π(i,n+1,z) = Π 1 (Π(n,n,z)) if i = n, Π 2 (Π(n,n,z)) if i > n.

238 238 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM By previous exercise, this is legitimte primitive recursive definition. Some bsic properties of Π re given s exercises. In prticulr, the following properties re esily shown: () 0,...,0 n = 0, x,0 = x,0,...,0 n ; (b) Π(0,n,z) = Π(1,n,z) nd Π(i,n,z) = Π(n,n,z), for ll i n nd ll n,z N; (c) Π(1,n,z),...,Π(n,n,z) n = z, for ll n 1 nd ll z N; (d) Π(i,n,z) z, for ll i,n,z N; (e) There is primitive recursive function Lrge, such tht, for i,n,z N. Π(i,n+1,Lrge(n+1,z)) = z, As first ppliction, we observe tht we need only consider prtil computble functions (prtil recursive functions) 1 of single rgument. Indeed, let ϕ: N n N be prtil computble function of n 2 rguments. Let ϕ(z) = ϕ(π(1,n,z),...,π(n,n,z)), for ll z N. Then, ϕ is prtil computble function of single rgument, nd ϕ cn be recovered from ϕ, since ϕ(x 1,...,x n ) = ϕ( x 1,...,x n ). Thus, using, nd Π s coding nd decoding functions, we cn restrict our ttention to functions of single rgument. Next, we show tht there exist coding nd decoding functions between Σ nd { 1 }, nd tht prtil computble functions over Σ cn be recoded s prtil computble functions over { 1 }. Since { 1 } is isomorphic to N, this shows tht we cn restrict out ttention to functions defined over N. 9.2 Equivlence of Alphbets Given n lphbet Σ = { 1,..., k }, strings over Σ cn be ordered by viewing strings s numbers in number system where the digits re 1,..., k. In this number system, which is lmost the number system with bse k, the string 1 corresponds to zero, nd k to k 1. Hence, we hve kind of shifted number system in bse k. For exmple, if Σ = {,b,c}, listing of Σ in the ordering corresponding to the number system begins with,b,c,,b,c,b,bb,bc,c,cb,cc,,b,c,b,bb,bc,... 1 The term prtil recursive is now considered old-fshion. Mny reserchers hve switched to the term prtil computble.

239 9.2. EQUIVALENCE OF ALPHABETS 239 Clerly, there is n ordering function from Σ to N which is bijection. Indeed, if u = i1 in, this function f: Σ N is given by f(u) = i 1 k n 1 +i 2 k n 2 + +i n 1 k +i n. Since we lso wnt decoding function, we define the coding function C k : Σ Σ s follows: C k (ǫ) = ǫ, nd if u = i1 in, then C k (u) = i 1k n 1 +i 2 k n 2 + +i n 1 k+i n 1. The function C k is primitive recursive, becuse C k (ǫ) = ǫ, C k (x i ) = C k (x) k i 1. The inverse of C k is function D k : { 1 } Σ. However, primitive recursive functions re totl, nd we need to extend D k to Σ. This is esily done by letting D k (x) = D k ( x 1 ) for ll x Σ. It remins to define D k by primitive recursion over { 1 }. For this, we introduce three uxiliry functions p, q, r, defined s follows. Let p(ǫ) = ǫ, p(x i ) = x i, if i k, p(x k ) = p(x). Note tht p(x) is the result of deteting consecutive k s in the til of x. Let q(ǫ) = ǫ, Note tht q(x) = x 1. Finlly, let r(ǫ) = 1, q(x i ) = q(x) 1. r(x i ) = x i+1, if i k, r(x k ) = x k. The function r is lmost the successor function, for the ordering. Then, the trick is tht D k (x i ) is the successor of D k (x) in the ordering, nd if D k (x) = y j n k

240 240 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM with j k, since the successor of y j n k is y j+1 n k, we cn use r. Thus, we hve D k (ǫ) = ǫ, D k (x i ) = r(p(d k (x)))q(d k (x) p(d k (x))). Then, both C k nd D k re primitive recursive, nd C k D k = D k C k = id. Let ϕ: Σ Σ be prtil function over Σ, nd let ϕ + (x 1,...,x n ) = C k (ϕ(d k (x 1 ),...,D k (x n ))). The function ϕ + is defined over { 1 }. Also, for ny prtil function ψ over { 1 }, let ψ (x 1,...,x n ) = D k (ψ(c k (x 1 ),...,C k (x n ))). We clim tht if ψ is prtil computble function over { 1 }, then ψ is prtil computble over Σ, ndtht if ϕis prtil computble functionover Σ, thenϕ + isprtil computble over { 1 }. First, ψ cn be extended to Σ by letting ψ(x) = ψ( x 1 ) for ll x Σ, nd so, if ψ is prtil computble, then so is ψ by composition. This seems eqully obvious for ϕ nd ϕ +, but there is difficulty. The problem is tht ϕ + is defined s composition of functions over Σ. We hve to show how ϕ + cn be defined directly over { 1 } without using ny dditionl lphbet symbols. This is done in Mchtey nd Young [12], see Section 2.2, Lemm Piring functions cn lso be used to prove tht certin functions re primitive recursive, even though their definition is not legl primitive recursive definition. For exmple, consider the Fiboncci function defined s follows: f(0) = 1, f(1) = 1, f(n+2) = f(n+1)+f(n), for ll n N. This is not legl primitive recursive definition, since f(n+2) depends both on f(n+1) nd f(n). In primitive recursive definition, g(y+1,x) is only llowed to depend upon g(y, x). Definition 9.1. Given ny function f: N n N, the function f: N n+1 N defined such tht f(y,x) = f(0,x),...,f(y,x) y+1 is clled the course-of-vlue function for f.

241 9.2. EQUIVALENCE OF ALPHABETS 241 The following lemm holds. Proposition 9.1. Given ny function f: N n N, if f is primitive recursive, then so is f. Proof. First, it is necessry to define function con such tht if x = x 1,...,x m nd y = y 1,...,y n, where m,n 1, then con(m,x,y) = x 1,...,x m,y 1,...,y n. This fct is left s n exercise. Now, if f is primitive recursive, let f(0,x) = f(0,x), f(y +1,x) = con(y +1,f(y,x),f(y +1,x)), showing tht f is primitive recursive. Conversely, if f is primitive recursive, then nd so, f is primitive recursive. Remrk: Why is it tht does not work? f(y,x) = Π(y +1,y +1,f(y,x)), f(y +1,x) = f(y,x),f(y +1,x) We define course-of-vlue recursion s follows. Definition 9.2. Given ny two functions g: N n N nd h: N n+2 N, the function f: N n+1 N is defined by course-of-vlue recursion from g nd h if The following lemm holds. f(0,x) = g(x), f(y+1,x) = h(y,f(y,x),x). Proposition 9.2. If f: N n+1 N is defined by course-of-vlue recursion from g nd h nd g, h re primitive recursive, then f is primitive recursive. Proof. We prove tht f is primitive recursive. Then, by Proposition 9.1, f is lso primitive recursive. To prove tht f is primitive recursive, observe tht f(0,x) = g(x), f(y +1,x) = con(y +1,f(y,x),h(y,f(y,x),x)). When we use Proposition 9.2 to prove tht function is primitive recursive, we rrely bother to construct forml course-of-vlue recursion. Insted, we simply indicte how the vlue of f(y + 1,x) cn be obtined in primitive recursive mnner from f(0,x) through f(y,x). Thus, n informl use of Proposition 9.2 shows tht the Fiboncci function is primitive recursive. A rigorous proof of this fct is left s n exercise.

242 242 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM 9.3 Coding of RAM Progrms In this Section, we present specific encoding of RAM progrms which llows us to tret progrms s integers. Encoding progrms s integers lso llows us to hve progrms tht tke other progrms s input, nd we obtin universl progrm. Universl progrms hve the property tht given two inputs, the first one being the code of progrm nd the second one n input dt, the universl progrm simultes the ctions of the encoded progrm on the input dt. A coding scheme is lso clled n indexing or Gödel numbering, in honor to Gödel, who invented this technique. From results of the previous Chpter, without loss of generlity, we cn restrict out ttention to RAM progrms computing prtil functions of one rgument over N. Furthermore, we only need the following kinds of instructions, ech instruction being coded s shown below. Since we re considering functions over the nturl numbers, which corresponds to one-letter lphbet, there is only one kind of instruction of the form dd nd jmp (nd dd increments by 1 the contents of the specified register Rj). Ni dd Rj code = 1,i,j,0 Ni til Rj code = 2,i,j,0 Ni continue code = 3,i,1,0 Ni Rj jmp Nk code = 4,i,j,k Ni Rj jmp Nkb code = 5,i,j,k Recll tht conditionl jump cuses jump to the closest ddress Nk bove or below iff Rj is nonzero, nd if Rj is null, the next instruction is executed. We ssume tht ll lines in RAM progrm re numbered. This is lwys fesible, by lbeling unnmed instructions with new nd unused line number. The code of n instruction I is denoted s #I. To simplify the nottion, we introduce the following decoding primitive recursive functions Typ, Nm, Reg, nd Jmp, defined s follows: Typ(x) = Π(1,4,x), Nm(x) = Π(2,4,x), Reg(x) = Π(3,4,x), Jmp(x) = Π(4,4,x). The functions yield the type, line number, register nme, nd line number jumped to, if ny, for n instruction coded by x. Note tht we hve no need to interpret the vlues of these functions if x does not code n instruction. We cn define the primitive recursive predicte INST, such tht INST(x) holds iff x codes n instruction. First, we need the connective (implies), defined such tht P Q iff P Q.

243 9.3. CODING OF RAM PROGRAMS 243 Then, INST(x) holds iff: [1 Typ(x) 5] [1 Reg(x)] [Typ(x) 3 Jmp(x) = 0] [Typ(x) = 3 Reg(x) = 1]. Progrm re coded s follows. If P is RAM progrm composed of the n instructions I 1,...,I n, the code of P, denoted s #P, is Recll from previous exercise tht #P = n,#i 1,...,#I n. n,#i 1,...,#I n = n, #I 1,...,#I n. Also recll tht x,y = ((x+y) 2 +3x+y)/2. Consider the following progrm Pdd2 computing the function dd2: N N given by dd2(n) = n+2. I 1 : 1 dd R1 I 2 : 2 dd R1 I 3 : 3 continue We hve #I1 = 1,1,1,0 4 = 1, 1, 1,0 = 37 #I2 = 1,2,1,0 4 = 1, 2, 1,0 = 92 #I3 = 3,3,1,0 4 = 3, 3, 1,0 = 234 nd #Pdd2 = 3,#I1,#I2,#I3 4 = 3, 37, 92,234 = The codes get big fst! We define the primitive recursive functions Ln, Pg, nd Line, such tht: Ln(x) = Π(1,2,x), Pg(x) = Π(2,2,x), Line(i, x) = Π(i, Ln(x), Pg(x)).

244 244 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM The function Ln yields the length of the progrm (the number of instructions), Pg yields the sequence of instructions in the progrm (relly, code for the sequence), nd Line(i, x) yields the code of the ith instruction in the progrm. Agin, if x does not code progrm, there is no need to interpret these functions. However, note tht by previous exercise, it hppens tht Line(0, x) = Line(1, x), nd Line(Ln(x),x) = Line(i,x), for ll i x. The primitive recursive predicte PROG is defined such tht PROG(x) holds iff x codes progrm. Thus, PROG(x) holds if ech line codes n instruction, ech jump hs n instruction to jump to, nd the lst instruction is continue. Thus, PROG(x) holds iff i Ln(x)[i 1 [INST(Line(i, x)) Typ(Line(Ln(x), x)) = 3 [Typ(Line(i,x)) = 4 j i 1[j 1 Nm(Line(j,x)) = Jmp(Line(i,x))]] [Typ(Line(i,x)) = 5 j Ln(x)[j > i Nm(Line(j,x)) = Jmp(Line(i,x))]]]] Note tht we hve used the fct proved s n exercise tht if f is primitive recursive function nd P is primitive recursive predicte, then x f(y)p(x) is primitive recursive. We re now redy to prove fundmentl result in the theory of lgorithms. This result points out some of the limittions of the notion of lgorithm. Theorem 9.3. (Undecidbility of the hlting problem) There is no RAM progrm Decider which hlts for ll inputs nd hs the following property when strted with input x in register R1 nd with input i in register R2 (the other registers being set to zero): (1) Decider hlts with output 1 iff i codes progrm tht eventully hlts when strted on input x (ll other registers set to zero). (2) Decider hlts with output 0 in R1 iff i codes progrm tht runs forever when strted on input x in R1 (ll other registers set to zero). (3) If i does not code progrm, then Decider hlts with output 2 in R1. Proof. Assume tht Decider is such RAM progrm, nd let Q be the following progrm with single input: ProgrmQ(codeq) N1 R2 P continue R1 jmp continue R1 N1

245 9.3. CODING OF RAM PROGRAMS 245 Let i be the code of some progrm P. The key point is tht the termintion behvior of Q on input i is exctly the opposite of the termintion behvior of Decider on input i nd code i. (1) If Decider sys tht progrm P coded by i hlts on input i, then R1 just fter the continue in line N1 contins 1, nd Q loops forever. (2) If Decider sys tht progrm P coded by i loops forever on input i, then R1 just fter continue in line N1 contins 0, nd Q hlts. The progrm Q cn be trnslted into progrm using only instructions of type 1, 2, 3, 4, 5, described previously, nd let q be the code of the progrm Q. Let us see wht hppens if we run the progrm Q on input q in R1 (ll other registers set to zero). Just fter execution of the ssignment R2 R1, the progrm Decider is strted with q in both R1 nd R2. Since Decider is supposed to hlt for ll inputs, it eventully hlts with output 0 or 1 in R1. If Decider hlts with output 1 in R1, then Q goes into n infinite loop, while if Decider hlts with output 0 in R1, then Q hlts. But then, becuse of the definition of Decider, we see tht Decider sys tht Q hlts when strted on input q iff Q loops forever on input q, nd tht Q loops forever on input q iff Q hlts on input q, contrdiction. Therefore, Decider cnnot exist. If we identify the notion of lgorithm with tht of RAM progrm which hlts for ll inputs, the bove theorem sys tht there is no lgorithm for deciding whether RAM progrm eventully hlts for given input. We sy tht the hlting problem for RAM progrms is undecidble (or unsolvble). The bove theorem lso implies tht the hlting problem for Turing mchines is undecidble. Indeed, if we hd n lgorithm for solving the hlting problem for Turing mchines, we could solve the hlting problem for RAM progrms s follows: first, pply the lgorithm for trnslting RAM progrm into n equivlent Turing mchine, nd then pply the lgorithm solving the hlting problem for Turing mchines. The rgument is typicl in computbility theory nd is clled reducibility rgument. Our next gol is to define primitive recursive function tht describes the computtion of RAM progrms. Assume tht we hve RAM progrm P using n registers R1,...,Rn, whose contents re denoted s r 1,...,r n. We cn code r 1,...,r n into single integer r 1,...,r n. Conversely, every integer x cn be viewed s coding the contents of R1,...,Rn, by tking the sequence Π(1,n,x),...,Π(n,n,x). Actully, it is not necessry to know n, the number of registers, if we mke the following observtion: Reg(Line(i,x)) Line(i,x) Pg(x)

246 246 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM for ll i,x N. Then, if x codes progrm, then R1,...,Rx certinly include ll the registers in the progrm. Also note tht from previous exercise, r 1,...,r n,0,...,0 = r 1,...,r n,0. We now define the primitive recursive functions Nextline, Nextcont, nd Comp, describing the computtion of RAM progrms. Definition 9.3. Let x code progrm nd let i be such tht 1 i Ln(x). The following functions re defined: (1) Nextline(i,x,y) is the number of the next instruction to be executed fter executing the ith instruction in the progrm coded by x, where the contents of the registers is coded by y. (2) Nextcont(i,x,y) is the code of the contents of the registers fter executing the ith instruction in the progrm coded by x, where the contents of the registers is coded by y. (3) Comp(x,y,m) = i,z, where i nd z re defined such tht fter running the progrm coded by x for m steps, where the initil contents of the progrm registers re coded by y, the next instruction to be executed is the ith one, nd z is the code of the current contents of the registers. Proposition 9.4. The functions Nextline, Nextcont, nd Comp, re primitive recursive. Proof. (1) Nextline(i,x,y) = i+1, unless the ith instruction is jump nd the contents of the register being tested is nonzero: Nextline(i, x, y) = mxj Ln(x)[j < i Nm(Line(j,x)) = Jmp(Line(i,x))] if Typ(Line(i,x)) = 4 Π(Reg(Line(i,x)),x,y) 0 minj Ln(x)[j > i Nm(Line(j,x)) = Jmp(Line(i,x))] if Typ(Line(i,x)) = 5 Π(Reg(Line(i,x)),x,y) 0 i+1 otherwise. Note tht ccording to this definition, if the ith line is the finl continue, then Nextline signls tht the progrm hs hlted by yielding Nextline(i, x, y) > Ln(x). (2) We need two uxiliry functions Add nd Sub defined s follows. Add(j,x,y) is the number coding the contents of the registers used by the progrm coded by x fter register Rj coded by Π(j,x,y) hs been incresed by 1, nd

247 9.3. CODING OF RAM PROGRAMS 247 Sub(j,x,y) codes the contents of the registers fter register Rj hs been decremented by 1 (y codes the previous contents of the registers). It is esy to see tht Sub(j,x,y) = minz y[π(j,x,z) = Π(j,x,y) 1 k x[0 < k j Π(k,x,z) = Π(k,x,y)]]. The definition of Add is slightly more tricky. We leve s n exercise to the reder to prove tht: Add(j,x,y) = minz Lrge(x,y +1) [Π(j,x,z) = Π(j,x,y)+1 k x[0 < k j Π(k,x,z) = Π(k,x,y)]], where the function Lrge is the function defined in n erlier exercise. Then Nextcont(i, x, y) = Add(Reg(Line(i,x),x,y) if Typ(Line(i,x)) = 1 Sub(Reg(Line(i,x),x,y) if Typ(Line(i,x)) = 2 y if Typ(Line(i,x)) 3. (3) Recll tht Π 1 (z) = Π(1,2,z) nd Π 2 (z) = Π(2,2,z). The function Comp is defined by primitive recursion s follows: Comp(x,y,0) = 1,y Comp(x,y,m+1) = Nextline(Π 1 (Comp(x,y,m)),x,Π 2 (Comp(x,y,m))), Nextcont(Π 1 (Comp(x,y,m)),x,Π 2 (Comp(x,y,m))). Recll tht Π 1 (Comp(x,y,m)) is the number of the next instruction to be executed nd tht Π 2 (Comp(x,y,m)) codes the current contents of the registers. We cn now reprove tht every RAM computble function is prtil computble. Indeed, ssume tht x codes progrm P. We define the prtil function End so tht for ll x,y, where x codes progrm nd y codesthecontents ofitsregisters, End(x,y)isthenumber ofsteps forwhich thecomputtion runs before hlting, if it hlts. If the progrm does not hlt, then End(x,y) is undefined. Since End(x,y) = minm[π 1 (Comp(x,y,m)) = Ln(x)], If y is the vlue of the register R1 before the progrm P coded by x is strted, recll tht the contents of the registers is coded by y,0. Noticing tht 0 nd 1 do not code progrms, we note tht if x codes progrm, then x 2, nd Π 1 (z) = Π(1,x,z) is the contents of R1 s coded by z.

248 248 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM Since Comp(x,y,m) = i,z, we hve Π 1 (Comp(x,y,m)) = i, where i is the number (index) of the instruction reched fter running the progrm P coded by x with initil vlues of the registers coded by y for m steps. Thus, P hlts if i is the lst instruction in P, nmely Ln(x), iff Π 1 (Comp(x,y,m)) = Ln(x). End is prtil computble function; it cn be computed by RAM progrm involving only one while loop serching for the number of steps m. However, in generl, End is not totl function. If ϕ is the prtil computble function computed by the progrm P coded by x, then we hve ϕ(y) = Π 1 (Π 2 (Comp(x, y,0,end(x, y,0 )))). This is becuse if m = End(x, y,0 ) is the number of steps fter which the progrm P coded by x hlts on input y, then Comp(x, y,0,m)) = Ln(x),z, where z is the code of the register contents when the progrm stops. Consequently z = Π 2 (Comp(x, y,0,m)) The vlue of the register R1 is Π 1 (z), tht is z = Π 2 (Comp(x, y,0,end(x, y,0 ))). ϕ(y) = Π 1 (Π 2 (Comp(x, y,0,end(x, y,0 )))). Observe tht ϕ is written inthe formϕ = g minf, for some primitive recursive functions f nd g. We cn lso exhibit prtil computble function which enumertes ll the unry prtil computble functions. It is universl function. Abusing the nottion slightly, we will write ϕ(x,y) for ϕ( x,y ), viewing ϕ s function of two rguments (however, ϕ is relly function of single rgument). We define the function ϕ univ s follows: { ϕ univ (x,y) = Π1 (Π 2 (Comp(x, y,0,end(x, y,0 )))) if PROG(x), undefined otherwise. The function ϕ univ is prtil computble function with the following property: for every x coding RAM progrm P, for every input y, ϕ univ (x,y) = ϕ x (y),

249 9.3. CODING OF RAM PROGRAMS 249 the vlue of the prtil computble function ϕ x computed by the RAM progrm P coded by x. If x does not code progrm, then ϕ univ (x,y) is undefined for ll y. By Proposition 8.9, the prtil function ϕ univ is not computble (recursive). 2 Indeed, being n enumerting function for the prtil computble functions, it is n enumerting function for the totl computble functions, nd thus, it cnnot be computble. Being prtil function sves us from contrdiction. The existence of the function ϕ univ leds us to the notion of n indexing of the RAM progrms. We cn define listing of the RAM progrms s follows. If x codes progrm (tht is, if PROG(x) holds) nd P is the progrm tht x codes, we cll this progrm P the xth RAM progrm nd denote it s P x. If x does not code progrm, we let P x be the progrm tht diverges for every input: N1 dd R1 N1 R1 jmp N1 N1 continue Therefore, in ll cses, P x stnds for the xth RAM progrm. Thus, we hve listing of RAM progrms, P 0,P 1,P 2,P 3,..., such tht every RAM progrm (of the restricted type considered here) ppers in the list exctly once, except for the infinite loop progrm. For exmple, the progrm Pdd2 (dding 2 to n integer) ppers s P In prticulr, note tht ϕ univ being prtil computble function, it is computed by some RAM progrm UNIV tht hs code univ nd is the progrm P univ in the list. Hving n indexing of the RAM progrms, we lso hve n indexing of the prtil computble functions. Definition 9.4. For every integer x 0, we let P x be the RAM progrm coded by x s defined erlier, nd ϕ x be the prtil computble function computed by P x. For exmple, the function dd2 (dding 2 to n integer) ppers s ϕ Remrk: Kleene used the nottion {x} for the prtil computble function coded by x. Due to the potentil confusion with singleton sets, we follow Rogers, nd use the nottion ϕ x. 2 The term recursive function is now considered old-fshion. Mny reserchers hve switched to the term computble function.

250 250 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM It is importnt to observe tht different progrms P x nd P y my compute the sme function, tht is, while P x P y for ll x y, it is possible tht ϕ x = ϕ y. In fct, it is undecidble whether ϕ x = ϕ y. The existence of the universl function ϕ univ is sufficiently importnt to be recorded in the following Lemm. Proposition 9.5. For the indexing of RAM progrms defined erlier, there is universl prtil computble function ϕ univ such tht, for ll x,y N, if ϕ x is the prtil computble function computed by P x, then ϕ x (y) = ϕ univ ( x,y ). The progrm UNIV computing ϕ univ cn be viewed s n interpreter for RAM progrms. By giving the universl progrm UNIV the progrm x nd the dt y, we get the result of executing progrm P x on input y. We cn view the RAM model s stored progrm computer. By Theorem 9.3 nd Proposition 9.5, the hlting problem for the single progrm UNIV is undecidble. Otherwise, the hlting problem for RAM progrms would be decidble, contrdiction. It should be noted tht the progrm UNIV cn ctully be written (with certin mount of pin). The object of the next Section is to show the existence of Kleene s T-predicte. This will yield nother importnt norml form. In ddition, the T-predicte is bsic tool in recursion theory. 9.4 Kleene s T-Predicte In Section 9.3, we hve encoded progrms. The ide of this Section is to lso encode computtions of RAM progrms. Assume tht x codes progrm, tht y is some input (not code), nd tht z codes computtion of P x on input y. The predicte T(x,y,z) is defined s follows: T(x,y,z) holds iff x codes RAM progrm, y is n input, nd z codes hlting computtion of P x on input y. We will show tht T is primitive recursive. First, we need to encode computtions. We sy tht z codes computtion of length n 1 if z = n+2, 1,y 0, i 1,y 1,..., i n,y n, where ech i j is the physicl loction of the next instruction to be executed nd ech y j codes the contents of the registers just before execution of the instruction t the loction i j. Also, y 0 codes the initil contents of the registers, tht is, y 0 = y,0, for some input y. We let Ln(z) = Π 1 (z). Note tht i j denotes the physicl loction of the next instruction to

251 9.4. KLEENE S T-PREDICATE 251 be executed in the sequence of instructions constituting the progrm coded by x, nd not the line number (lbel) of this instruction. Thus, the first instruction to be executed is in loction 1, 1 i j Ln(x), nd i n 1 = Ln(x). Since the lst instruction which is executed is the lst physicl instruction in the progrm, nmely, continue, there is no next instruction to be executed fter tht, nd i n is irrelevnt. Writing the definition of T is little simpler if we let i n = Ln(x)+1. Definition 9.5. The T-predicte is the primitive recursive predicte defined s follows: T(x,y,z) iff PROG(x) nd (Ln(z) 3)nd j Ln(z) 3[0 j Nextline(Π 1 (Π(j +2,Ln(z),z)),x,Π 2 (Π(j +2,Ln(z),z))) = Π 1 (Π(j +3,Ln(z),z))nd Nextcont(Π 1 (Π(j +2,Ln(z),z)),x,Π 2 (Π(j +2,Ln(z),z))) = Π 2 (Π(j +3,Ln(z),z))nd Π 1 (Π(Ln(z) 1,Ln(z),z)) = Ln(x)nd Π 1 (Π(2,Ln(z),z)) = 1 nd y = Π 1 (Π 2 (Π(2,Ln(z),z)))ndΠ 2 (Π 2 (Π(2,Ln(z),z))) = 0] The reder cn verify tht T(x,y,z) holds iff x codes RAM progrm, y is n input, nd z codes hlting computtion of P x on input y. In order to extrct the output of P x from z, we define the primitive recursive function Res s follows: Res(z) = Π 1 (Π 2 (Π(Ln(z),Ln(z),z))). The explntion for this formul is tht Res(z) re the contents of register R1 when P x hlts, tht is, Π 1 (y Ln(z) ). Using the T-predicte, we get the so-clled Kleene norml form. Theorem 9.6. (Kleene Norml Form) Using the indexing of the prtil computble functions defined erlier, we hve ϕ x (y) = Res[minz(T(x,y,z))], where T(x,y,z) nd Res re primitive recursive. Note tht the universl function ϕ univ cn be defined s ϕ univ (x,y) = Res[minz(T(x,y,z))]. There is nother importnt property of the prtil computble functions, nmely, tht composition is effective. We need two uxiliry primitive recursive functions. The function Conprogs cretes the code of the progrm obtined by conctenting the progrms P x nd P y, nd for i 2, Cumclr(i) is the code of the progrm which clers registers R2,...,Ri. To get Cumclr, we cn use the function clr(i) such tht clr(i) is the code of the progrm N1 til Ri N1 Ri jmp N1 N continue

252 252 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM We leve it s n exercise to prove tht clr, Conprogs, nd Cumclr, re primitive recursive. Theorem 9.7. There is primitive recursive function c such tht ϕ c(x,y) = ϕ x ϕ y. Proof. If both x nd y code progrms, then ϕ x ϕ y cn be computed s follows: Run P y, cler ll registers but R1, then run P x. Otherwise, let loop be the index of the infinite loop progrm: { Conprogs(y,Conprogs(Cumclr(y),x)) if PROG(x) nd PROG(y) c(x,y) = loop otherwise. 9.5 A Simple Function Not Known to be Computble The 3n+1 problem proposed by Colltz round 1937 is the following: Given ny positive integer n 1, construct the sequence c i (n) s follows strting with i = 1: c 1 (n) = n { c i (n)/2 if c i (n) is even c i+1 (n) = 3c i (n)+1 if c i (n) is odd. Observe tht for n = 1, we get the infinite periodic sequence 1 = 4 = 2 = 1 = 4 = 2 = 1 =, so we my ssume tht we stop the first time tht the sequence c i (n) reches the vlue 1 (if it ctully does). Such n index i is clled the stopping time of the sequence. And this is the problem: Conjecture (Colltz): For ny strting integer vlue n 1, the sequence (c i (n)) lwys reches 1. Strting with n = 3, we get the sequence Strting with n = 5, we get the sequence 3 = 10 = 5 = 16 = 8 = 4 = 2 = 1. 5 = 16 = 8 = 4 = 2 = 1.

253 9.5. A SIMPLE FUNCTION NOT KNOWN TO BE COMPUTABLE 253 Strting with n = 6, we get the sequence 6 = 3 = 10 = 5 = 16 = 8 = 4 = 2 = 1. Strting with n = 7, we get the sequence 7 = 22 = 11 = 34 = 17 = 52 = 26 = 13 = 40 = 20 = 10 = 25 = 16 = 8 = 4 = 2 = 1. One might be surprised to find tht for n = 27, it tkes 111 steps to rech 1, nd for n = 97, it tkes 118 steps. I computed the stopping times for n up to 10 7 nd found tht the lrgest stopping time, 525 (524 steps), is obtined for n = The terms of this sequence rech vlues over The grph of the sequence s(837799) is shown in Figure Figure 9.1: Grph of the sequence for n = by We cn define the prtil computble function C (with positive integer inputs) defined C(n) = the smllest i such tht c i (n) = 1 if it exists. Then the Colltz conjecture is equivlent to sserting tht the function C is (totl) computble. The grph of the function C for 1 n 10 7 is shown in Figure 9.2. So fr, the conjecture remins open. It hs been checked by computer for ll integers

254 254 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM Figure 9.2: Grph of the function C for 1 n A Non-Computble Function; Busy Bevers Totl functions tht re not computble must grow very fst nd thus re very complicted. Yet, in 1962, Rdó published pper in which he defined two functions Σ nd S (involving computtions of Turing mchines) tht re totl nd not computble. Consider Turing mchines with tpe lphbet Γ = {1,B} with two symbols (B being the blnk). We lso ssume tht these Turing mchines hve specil finl stte q F, which is blocking stte (there reno trnsitions fromq F ). We do not count this stte when counting the number of sttes of such Turing mchines. The gme is to run such Turing mchines with fixed number of sttes n strting on blnk tpe, with the gol of producing the mximum number of (not necessrily consecutive) ones (1). Definition 9.6. The function Σ (defined on the positive nturl numbers) is defined s the mximum number Σ(n) of (not necessrily consecutive) 1 s written on the tpe fter Turing mchine with n 1 sttes strted on the blnk tpe hlts. The function S is defined s the mximum number S(n) of moves tht cn be mde by Turing mchine of the bove type with n sttes before it hlts, strted on the blnk tpe. A Turing mchine with n sttes tht writes the mximum number Σ(n) of 1 s when strted on the blnk tpe is clled busy bever.

255 9.6. A NON-COMPUTABLE FUNCTION; BUSY BEAVERS 255 Busy bevers re hrd to find, even for smll n. First, it cn be shown tht the number of distinct Turing mchines of the bove kind with n sttes is (4(n+1)) 2n. Second, since it is undecidble whether Turing mchine hlts on given input, it is hrd to tell which mchines loop or hlt fter very long time. Here is summry of wht is known for 1 n 6. Observe tht the exct vlue of Σ(5), Σ(6), S(5) nd S(6) is unknown. n Σ(n) S(n) ,176, ,524,079 8,690,333,381,690, The first entry in the tble for n = 6 corresponds to mchine due to Heiner Mrxen (1999). This record ws surpssed by Pvel Kropitz in 2010, which corresponds to the second entry for n = 6. The mchines chieving the record in 2017 for n = 4,5,6 re shown below, where the blnk is denoted insted of B, nd where the specil hlting sttes is denoted H: 4-stte busy bever: A B C D (1,R,B) (1,L,A) (1,R,H) (1,R,D) 1 (1,L,B) (,L,C) (1,L,D) (,R,A) The bove mchine output 13 ones in 107 steps. In fct, the output is stte best contender: A B C D E (1,R,B) (1,R,C) (1,R,D) (1,L,A) (1,R,H) 1 (1,L,C) (1,R,B) (,L,E) (1,L,D) (,L,A) The bove mchine output 4098 ones in 47,176,870 steps.

256 256 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM 6-stte contender (Heiner Mrxen): A B C D E F (1,R,B) (1,L,C) (,R,F) (1,R,A) (1,L,H) (,L,A) 1 (1,R,A) (1,L,B) (1,L,D) (,L,E) (1,L,F) (,L,C) The bove mchine outputs 96,524,079 ones in 8,690,333,381,690,951 steps. 6-stte best contender (Pvel Kropitz): A B C D E F (1,R,B) (1,R,C) (1,L,D) (1,R,E) (1,L,A) (1,L,H) 1 (1,L,E) (1,R,F) (,R,B) (,L,C) (,R,D) (1,R,C) The bove mchine output t lest ones! The reson why it is so hrd to compute Σ nd S is tht they re not computble! Theorem 9.8. The functions Σ nd S re totl functions tht re not computble (not recursive). Proof sketch. The proof consists in showing tht Σ (nd similrly for S) eventully outgrows ny computble function. More specificlly, we clim tht for every computble function f, there is some positive integer k f such tht Σ(n+k f ) f(n) for ll n 0. We simply hve to pick k f to be the number of sttes of Turing mchine M f computing f. Then, we cn crete Turing mchine M n,f tht works s follows. Using n of its sttes, it writes n ones on the tpe, nd then it simultes M f with input 1 n. Since the ouput of M n,f strted on the blnk tpe consists of f(n) ones, nd since Σ(n+k f ) is the mximum number of ones tht turing mchine with n+k f sttes will ouput when it stops, we must hve Σ(n+k f ) f(n) for ll n 0. Next observe tht Σ(n) < Σ(n + 1), becuse we cn crete Turing mchine with n + 1 sttes which simultes busy bever mchine with n sttes, nd then writes n extr 1 when the busy bever stops, by mking trnsition to the (n+1)th stte. It follows immeditely tht if m < n then Σ(m) < Σ(n). If Σ ws computble, then so would be the function g given by g(n) = Σ(2n). By the bove, we would hve Σ(n+k g ) g(n) = Σ(2n) for ll n 0, nd for n > k g, since 2n > n+k k, we would hve Σ(n+n g ) < Σ(2n), contrdicting the fct tht Σ(n+n g ) Σ(2n).

9.6. A NON-COMPUTABLE FUNCTION; BUSY BEAVERS 257 Since by definition S(n) is the mximum number of moves tht cn be mde by Turing mchine of the bove type with n sttes before it hlts, S(n) Σ(n).

257 9.6. A NON-COMPUTABLE FUNCTION; BUSY BEAVERS 257 Since by definition S(n) is the mximum number of moves tht cn be mde by Turing mchine of the bove type with n sttes before it hlts, S(n) Σ(n). Then the sme resoning s bove shows tht S is not computble function. The zoo of comutble nd non-computble functions is illustrted in Figure 9.3. Busy Bever Only initil cses computed. prtil computble built from primitive recursive nd minimiztion (while loops) totl computble 3x + 1 problem membership in lnguge (set) primitive recursive E S P n i dd mult exp supexp poor thing is so busy, he is nemic! φ (x,y) univ prtil decider rtionl expressions termintes for ll input functions tht computer cn t clculte grow too fst: overflow Figure 9.3: Computbility Clssifiction of Functions.

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES 2.6 Finite Stte Automt With Output: Trnsducers So fr, we hve only considered utomt tht recognize lnguges, i.e., utomt tht do not produce ny output on ny input