TREE AUTOMATA AND TREE GRAMMARS

Size: px

Start display at page:

Download "TREE AUTOMATA AND TREE GRAMMARS"

Prosper Fields
6 years ago
Views:

1 TREE AUTOMATA AND TREE GRAMMARS rxiv: v1 [cs.fl] 7 Oct 2015 by Joost Engelfriet DAIMI FN-10 April 1975 Institute of Mthemtics, University of Arhus DEPARTMENT OF COMPUTER SCIENCE Ny Munkegde, 8000 Arhus C, Denmrk

2 Prefce I wrote these lecture notes during my sty in Arhus in the cdemic yer 1974/75. As young resercher I hd wonderful time t DAIMI, nd I hve lwys been hppy to hve hd tht erly experience. I wish to thnk Heiko Vogler for his noble pln to move these notes into the digitl world, nd I m grteful to Florin Strke nd Mrkus Npierkowski (nd Heiko) for the excellent trnsformtion of my hnd-written mnuscript into L A TEX. Aprt from the reprtion of errors nd some cosmeticl chnges, the text of the lecture notes hs not been chnged. Of course, mny things hve hppened in tree lnguge theory since In prticulr, most of the problems mentioned in these notes hve been solved. The developments until 1984 re described in the book Tree Automt by Ferenc Gécseg nd Mgnus Steinby, nd for recent developments I recommend the Appendix of the reissue of tht book t rxiv.org/bs/ Joost Engelfriet, October 2015 LIACS, Leiden University, The Netherlnds

3 Tree utomt nd tree grmmrs To pprecite the theory of tree utomt nd tree grmmrs one should lredy be motivted by the gols nd results of forml lnguge theory. In prticulr one should be interested in derivtion trees. A derivtion tree models the grmmticl structure of sentence in (context-free) lnguge. By considering only the bottom of the tree the sentence my be recovered from the tree. The first ide in tree lnguge theory is to generlize the notion of finite utomton working on strings to tht of finite utomton operting on trees. It turns out tht lrge prt of the theory of regulr lnguges cn rther esily be generlized to theory of regulr tree lnguges. Moreover, since regulr tree lnguge is (lmost) the sme s the set of derivtion trees of some context-free lnguge, one obtins results bout context-free lnguges by tking the bottom of results bout regulr tree lnguges. The second ide in tree lnguge theory is to generlize the notion of generlized sequentil mchine (tht is, finite utomton with output) to tht of finite stte tree trnsducer. Tree trnsducers re more complicted thn string trnsducers since they re equipped with the bsic cpbilities of copying, deleting nd reordering (of subtrees). The prt of (tree) lnguge theory tht is concerned with trnsltion of lnguges is minly motivted by compiler writing (nd, to lesser extent, by nturl linguistics). When considering bottoms of trees, finite stte trnsducers re essentilly the sme s syntx-directed trnsltion schemes. Results in this prt of tree lnguge theory tret the composition nd decomposition of tree trnsformtions, nd the properties of those tree lnguges tht cn be obtined by finite stte trnsformtion of regulr tree lnguges (or, tking bottoms, those lnguges tht cn be obtined by syntx-directed trnsltion of context-free lnguges). Thirdly there re, of course, mny other ides in tree lnguge theory. In the literture one cn find, for instnce, context-free tree grmmrs, recognition of subsets of rbitrry lgebrs, tree wlking utomt, hierrchies of tree lnguges (obtined by iterting old ides), decomposition of tree utomt, Lindenmyer tree grmmrs, etc. These lectures will be divided in the following five prts: (1) nd (2) contin preliminries, (3), (4) nd (5) re the min prts. (1) Introduction. (p. 1) (2) Some bsic definitions. (p. 2) (3) Recognizble (= regulr) tree lnguges. (p. 10) (4) Finite stte tree trnsformtions. (p. 32) (5) Whtever there is more to consider. Prt (5) is not contined in these notes; insted, some Notes on the literture re given on p. 69.

4 Contents 1 Introduction 1 2 Some bsic definitions 2 3 Recognizble tree lnguges Finite tree utomt nd regulr tree grmmrs Closure properties of recognizble tree lnguges Decidbility Finite stte tree trnsformtions Introduction: Tree trnsducers nd semntics Top-down nd bottom-up finite tree trnsducers Comprison of B nd T, the nondeterministic cse Decomposition nd composition of bottom-up tree trnsformtions Decomposition of top-down tree trnsformtions Comprison of B nd T, the deterministic cse Top-down finite tree trnsducers with regulr look-hed Surfce nd trget lnguges Notes on the literture 69

5 1 Introduction Our bsic dt type is the kind of tree used to express the grmmticl structure of strings in context-free lnguge. Exmple 1.1. Consider the context-free grmmr G = (N, Σ, R, S) with nonterminls N = {S, A, D}, terminls Σ = {, b, d}, initil nonterminl S nd the set of rules R, consisting of the rules S AD, A Ab, A ba, A AA, A λ, D Ddd nd D d (we use λ to denote the empty string). The string bbddd Σ cn be generted by G nd hs the following derivtion tree (see [Sl, II.6], [A&U, 0.5 nd 2.4.1]): S A D A A D d d b A A b d e e Note tht we use e s symbol stnding for the empty string λ. The string bbddd is clled the yield or result of the derivtion tree. Thus, in grph terminology, our trees re finite (finite number of nodes nd brnches), directed (the brnches re growing downwrds ), rooted (there is node, the root, with no brnches entering it), ordered (the brnches leving node re ordered from left to right) nd lbeled (the nodes re lbeled with symbols from some lphbet). The following intuitive terminology will be used: the rnk (or out-degree) of node is the number of brnches leving it (note tht the in-degree of node is lwys 1, except for the root which hs in-degree 0) lef is node with rnk 0 the top of tree is its root the bottom (or frontier) of tree is the set (or sequence) of its leves the yield (or result, or frontier) of tree is the string obtined by writing the lbels of its leves (except the lbel e) from left to right pth through tree is sequence of nodes connected by brnches ( leding downwrds ); the length of the pth is the number of its nodes minus one (tht is, the number of its brnches) the height (or depth) of tree is the length of the longest pth from the top to the bottom 1

6 if there is pth of length 1 (of length = 1) from node to node b then b is descendnt (direct descendnt) of nd is n ncestor (direct ncestor) of b subtree of tree is tree determined by node together with ll its descendnts; direct subtree is subtree determined by direct descendnt of the root of the tree; note tht ech tree is uniquely determined by the lbel of its root nd the (possibly empty) sequence of its direct subtrees the phrses bottom-up, bottom-to-top nd frontier-to-root re used to indicte this direction, while the phrses top-down, top-to-bottom nd root-to-frontier re used to indicte tht direction. In derivtion trees of context-free grmmrs ech symbol my only lbel nodes of certin rnks. For instnce, in the bove exmple,, b, d nd e my only lbel leves (nodes of rnk 0), A lbels nodes with rnks 1, 2 nd 3, S lbels nodes with rnk 2, nd D nodes of rnk 1 nd 3 (these numbers being the lengths of the right hnd sides of rules). Therefore, given some lphbet, we require the specifiction of finite number of rnks for ech symbol in the lphbet, nd we restrict ttention to those trees in which nodes of rnk k re lbeled by symbols of rnk k. 2 Some bsic definitions The mthemticl definition of tree my be given in severl, equivlent, wys. We will define tree s specil kind of string (others cll this string representtion of the tree, see [A&U, 0.5.7]). Before doing so, let us define rnked lphbets. Definition 2.1. An lphbet Σ is sid to be rnked if for ech nonnegtive integer k subset Σ k of Σ is specified, such tht Σ k is nonempty for finite number of k s only, nd such tht Σ = Σ k. k 0 If Σ k, then we sy tht hs rnk k (note tht my hve more thn one rnk). Usully we define specific rnked lphbet Σ by specifying those Σ k tht re nonempty. Exmple 2.2. The lphbet Σ = {, b, +, } is mde into rnked lphbet by specifying Σ 0 = {, b}, Σ 1 = { } nd Σ 2 = {+, }. (Think of negtion nd subtrction). Remrk 2.3. Throughout our discussions we shll use the symbol e s specil symbol, intuitively representing λ. Whenever e belongs to rnked lphbet, it is of rnk 0. Opertions on rnked lphbets should be defined s for instnce in the following definition. To be more precise one should define rnked lphbet s pir (Σ, f), where Σ is n lphbet nd f is mpping from N into P(Σ) such tht n k n : f(k) =, nd then denote f(k) by Σ k nd (Σ, f) by Σ. Note tht N = {0, 1, 2,...} is the set of nturl numbers nd tht P(Σ) is the set of subsets of Σ. 2

7 Definition 2.4. Let Σ nd be rnked lphbets. The union of Σ nd, denoted by Σ, is defined by (Σ ) k = Σ k k, for ll k 0. We sy tht Σ nd re equl, denoted by Σ =, if, for ll k 0, Σ k = k. We now define the notion of tree. Let [ nd ] be two symbols which re never elements of rnked lphbet. Definition 2.5. Given rnked lphbet Σ, the set of trees over Σ, denoted by T Σ, is the lnguge over the lphbet Σ {[, ]} defined inductively s follows. (i) If Σ 0, then T Σ. (ii) For k 1, if Σ k nd t 1, t 2,..., t k T Σ, then [t 1 t 2 t k ] T Σ. Intuitively, is tree with one node lbeled, nd [t 1 t 2 t k ] is the tree.... t 1 t 2... t k Exmple 2.6. Consider the rnked lphbet of Exmple 2.2. Then +[ [ [b]]] is tree over this lphbet, intuitively representing the tree which on its turn represents the expression ( ( b)) + (note tht the officil tree is the prefix nottion of this expression). Exmple 2.7. Consider the rnked lphbet, where 0 = {, b, d, e}, 1 = 3 = {A, D} nd 2 = {A, S}. A picture of the tree + S[A[A[bA[e]]A[A[e]b]]D[D[d]dd]] b in T is given in Exmple 1.1. Exercise 2.8. Tke some rnked lphbet Σ nd show tht T Σ is context-free lnguge over Σ {[, ]}. Our min im will be to study severl wys of constructively representing sets of trees nd reltions between trees. The bsic terminology is the following. Definition 2.9. Let Σ be rnked lphbet. A tree lnguge over Σ is ny subset of T Σ. 3

8 Definition Let Σ nd be rnked lphbets. A tree trnsformtion from T Σ into T is ny subset of T Σ T. Exercise Show tht the context-free grmmr G = (N, Σ, R, S) with N = {S}, Σ = {, b, [, ]} nd R = {S b[s], S } genertes tree lnguge over, where 0 = {} nd 2 = {b}. The bove definition of tree (Definition 2.5) gives rise to the following principles of proof by induction nd definition by induction for trees. (Note tht ech tree is, uniquely, either in Σ 0 or of the form [t 1 t k ]). Principle Principle of proof by induction (or recursion) on trees. Let P be property of trees (over Σ). If (i) ll elements of Σ 0 hve property P, nd (ii) for ech k 1 nd ech Σ k, if t 1,..., t k hve property P, then [t 1 t k ] hs property P, then ll trees in T Σ hve property P. Principle Principle of definition by induction (or recursion) on trees. Suppose we wnt to ssocite vlue h(t) with ech tree t in T Σ. Then it suffices to define h() for ll Σ 0, nd to show how to compute the vlue h([t 1 t k ]) from the vlues h(t 1 ),..., h(t k ). More formlly expressed, given set O of objects, nd (i) for ech Σ 0, n object o O, nd (ii) for ech k 1 nd ech Σ k, mpping f k : O k O, there is exctly one mpping h : T Σ O such tht (i) h() = o for ll Σ 0, nd (ii) h([t 1 t k ]) = f k (h(t 1 ),..., h(t k )) for ll k 1, Σ k nd t 1,..., t k T Σ. Exmple Let Σ 0 = {e} nd Σ 1 = { / }. The trees in T Σ re in n obvious one-to-one correspondence with the nturl numbers. The bove principles re the usul induction principles for these numbers. To illustrte the use of the induction principles we give the following useful definitions. Definition The mpping yield from T Σ into Σ 0 is defined inductively s follows. { if e (i) For Σ 0, yield() = λ if = e. (ii) For Σ k nd t 1,..., t k T Σ, yield([t 1 t k ]) = yield(t 1 ) yield(t 2 ) yield(t k ). Tht is, the conctention of yield(t 1),..., yield(t k ). 4

9 Moreover, for tree lnguge L T Σ, we define yield(l) = {yield(t) t L}. We shll sometimes bbrevite yield by y. Definition The mpping height from T Σ into N is defined recursively s follows. (i) For Σ 0, height() = 0. (ii) For Σ k nd t 1,..., t k T Σ, height([t 1 t k ]) = mx 1 i k (height(t i)) + 1. Exmple As n exmple of proof by induction on trees we show tht, if e / Σ 0 nd Σ 1 =, then, for ll t T Σ, height(t) < yield(t). Proof. For Σ 0, height() = 0 nd yield() = = 1 (since e). Now let Σ k (k 2) nd ssume (induction hypothesis) tht height(t i ) < yield(t i ) for 1 i k. Then yield([t 1 t k ]) = k yield(t i ) (Def. 2.15(ii)) ( k height(t i )) + k i=1 i=1 (ind. hypothesis) ( mx i)) i k (k 2 nd height(t i ) 0) > height([t 1 t k ]) (Def. 2.16(ii)). Exercise Let Σ be rnked lphbet such tht Σ 0 Σ k = for ll k 1. Define (string) homomorphism h from (Σ {[, ]}) into Σ 0 such tht, for ll t T Σ, h(t) = yield(t). Exercise Give recursive definition of the notion of subtree, for instnce s mpping sub : T Σ P(T Σ ) such tht sub(t) is the set of ll subtrees of t. Give lso n lterntive definition of subtree in more string-like fshion. Exercise Let pth(t) denote the set of ll pths from the top of t to its bottom. Think of forml definition for pth. The generliztion of forml lnguge theory to forml tree lnguge theory will come bout by viewing string s specil kind of tree nd tking the obvious generliztions. To be ble to view strings s trees we turn them 90 degrees to verticl position, s follows. Definition A rnked lphbet Σ is mondic if (i) Σ 0 = {e}, nd (ii) for k 2, Σ k =. The elements of T Σ re clled mondic trees. 5

10 Thus mondic rnked lphbet Σ is fully determined by the lphbet Σ 1. Mondic trees obviously cn be mde to correspond to the strings in Σ 1. There re two wys to do this, depending on whether we red top-down or bottom-up: f td : T Σ Σ 1 is defined by (i) f td (e) = λ (ii) f td ([t]) = f td (t) for Σ 1 nd t T Σ nd f bu : T Σ Σ 1 is defined by (i) f bu (e) = λ (ii) f bu ([t]) = f bu (t) for Σ 1 nd t T Σ. (Obviously both f td nd f bu re bijections). Accordingly, when generlizing string-concept to trees, we often hve the choice between top-down nd bottom-up generliztion. Exmple The string lphbet = {, b, c} corresponds to the mondic lphbet Σ with Σ 0 = {e} nd Σ 1 =. The tree b c b e in T Σ corresponds either to the string bcb in (top-down), or to the string bcb in (bottom-up). Note tht, due to our prefix definition of trees (Definition 2.5), the bove tree looks top-down like in its officil form [b[c[b[e]]]]. Obviously this is not essentil. Let us consider some bsic opertions on trees. A bsic opertion on strings is rightconctention with one symbol (tht is, for ech symbol in the lphbet there is n opertion rc such tht, for ech string w, rc (w) = w). Every string cn uniquely be built up from the empty string by these bsic opertions (consider the wy you write nd red!). Generlizing bottom-up, the corresponding bsic opertions on trees, here clled top conctention, re the following. Definition For ech Σ k (k 1) we define the (k-ry) opertion of top conctention with, denoted by tc k, to be the mpping from TΣ k into T Σ such tht, for ll t 1,..., t k T Σ, tc k (t 1,..., t k ) = [t 1 t k ]. Moreover, for tree lnguges L 1,..., L k, we define tc k (L 1,..., L k ) = {[t 1 t k ] t i L i for ll 1 i k}. 6

11 Note tht every tree cn uniquely be built up from the elements of Σ 0 by repeted top conctention. The next bsic opertion on strings is conctention. When viewed mondiclly, conctention corresponds to substituting one verticl string into the e of the other verticl string. In the generl cse, we my tke one tree nd substitute tree into ech lef of the originl tree, such tht different trees my be substituted into leves with different lbels. Thus we obtin the following bsic opertion on trees. Definition Let n 1, 1,..., n Σ 0 ll different, nd s 1,..., s n T Σ. For t T Σ, the tree conctention of t with s 1,..., s n t 1,..., n, denoted by t 1 s 1,..., n s n, is defined recursively s follows. (i) for Σ 0, 1 s 1,..., n s n = { s i (ii) for Σ k nd t 1,..., t k T Σ, [t 1 t k ]... = [t 1... t k... ], if = i otherwise where... bbrevites 1 s 1,..., n s n. If, in prticulr, n = 1, then, for ech Σ 0 nd t, s T Σ, the tree t s is lso denoted by t s. Exmple Let 0 = {x, y, c}, 2 = {b} nd 3 = {}. If t = [b[xy]xc], then t x b[cx], y c = [b[b[cx]c]b[cx]c]. Exercise Check tht in the mondic cse tree conctention corresponds to string conctention. For tree lnguges tree conctention is defined nlogously. Definition Let n 1, 1,..., n Σ 0 ll different, nd L 1,..., L n T Σ. For L T Σ we define the tree conctention of L with L 1,..., L n t 1,..., n, denoted by L 1 L 1,..., n L n, s follows. (i) for Σ 0, 1 L 1,..., n L n = { L i (ii) for Σ k nd t 1,..., t k T Σ, if = i otherwise [t 1 t k ]... = [t 1... t k... ] (iii) for L T Σ, L 1 L 1,..., n L n = t L t 1 L 1,..., n L n. As usul, given string w, we use w lso to denote the lnguge {w}. For tree lnguges M 1,..., M k we lso write [M 1 M k ] to denote tc k (M 1,..., M k ). This nottion is fully justified since [M 1 M k ] is the (string) conctention of the lnguges, [, M 1,..., M k nd ]! 7

12 If, in prticulr, n = 1, then, for ech Σ 0 nd ech L 1, L 2 T Σ, we denote L 1 L 2 lso by L 1 L 2. Remrks (1) Obviously, if L, L 1,..., L n re singletons, then Definition 2.27 is the sme s Definition (2) Note tht tree conctention, s defined bove, is nondeterministic in the sense tht, for instnce, to obtin t 1 L 1,..., n L n different elements of L 1 my be substituted t different occurrences of 1 in t. Deterministic tree conctention of t with L 1,..., L n t 1,..., n could be defined s {t 1 s 1,..., n s n s i L i for ll 1 i n}. In this cse different occurrences of 1 in t should be replced by the sme element of L 1. It is cler tht, in the cse tht L 1,..., L n re singletons, this distinction cnnot be mde. Intuitively, since trees re strings, tree conctention is nothing else but ordinry string substitution, fmilir from forml lnguge theory (see, for instnce, [Sl, I.3]). For completeness we give the definition of substitution of string lnguges. Definition Let be n lphbet. Let n 1, 1,..., n ll different nd let L 1,..., L n be lnguges over. For ny L, the substitution of L 1,..., L n for 1,..., n in L, denoted by L 1 L 1,..., n L n, is the lnguge over defined s follows: (i) λ 1 L 1,..., n L n = λ (ii) for, 1 L 1,..., n L n = (iii) for w nd, w... = w { L i (iv) for L, L 1 L 1,..., n L n = w L if = i otherwise w 1 L 1,..., n L n. If n = 1, L 1 L 2 will lso be denoted s L 1 L 2. If L 1,..., L n re singletons, then the substitution is clled homomorphism. Exercise Let n 1, 1,..., n Σ 0 ll different, i e for ll 1 i n, nd L, L 1,... L n T Σ. Prove tht yield(l 1 L 1,..., n L n ) = yield(l) 1 yield(l 1 ),..., n yield(l n ). (Thus: yield of tree conctention is string substitution of yields ). Exercise Prove tht Definitions 2.27 nd 2.29 give exctly the sme result for L 1 L 1,..., n L n where 1,..., n Σ 0 nd L, L 1,... L n re tree lnguges over Σ (nd thus, string lnguges over Σ {[, ]}). 8

13 Exercise Define the notion of ssocitivity for tree conctention, nd show tht tree conctention is ssocitive. Show tht, in generl, deterministic tree conctention is not ssocitive (cf. Remrk 2.28(2)). We shll need the following specil cse of tree conctention. Definition Let Σ be rnked lphbet nd let S be set of symbols or tree lnguge. Then the set of trees indexed by S, denoted by T Σ (S), is defined inductively s follows. (i) S Σ 0 T Σ (S) (ii) If k 1, Σ k nd t 1,..., t k T Σ (S), then [t 1 t k ] T Σ (S). Note tht T Σ ( ) = T Σ. Thus, if S is set of symbols, then T Σ (S) = T Σ S, where the elements of S re ssumed to hve rnk 0. If S is tree lnguge over rnked lphbet, then T Σ (S) is tree lnguge over the rnked lphbet Σ. Exercise Show tht, for ny Σ 0, T Σ (S) = T Σ (S {}). We close this section with two generl remrks. Remrk Definition 2.5 of tree is of course rther rbitrry. Other, eqully useful, wys of defining trees s specil kind of strings re obtined by replcing [t 1 t k ] in Definition 2.5 by [t 1 t k ] or t 1 t k ] or [t 1 t k ] or t 1 t k (only in the cse tht ech symbol hs exctly one rnk) or [ t 1 t k ] (where [ is new symbol for ech ) or [t 1,t 2,...,t k ] (where, is new symbol). Remrk Remrk on the generl philosophy in tree lnguge theory. The generl philosophy looks like this: (1) (2) (3) (1) Tke verticl string lnguge theory (cf. Definition 2.21), (2) generlize it to tree lnguge theory, nd (3) mp this into horizontl string lnguge theory vi the yield opertion (Definition 2.15). The fourth prt of the philosophy is (4) Tree lnguge theory is specific prt of string lnguge theory, illustrted s follows: 9

[b[cd]d] [b d] [cd] c b d d Exmple: (1). (verticl) string conctention (2). tree conctention (3). (horizontl) string substitution (see Exercise 2.30) (4). (2) is specil cse of (3) (see Exercise 2.

A deterministic finite utomton is structure M = (Q, Σ, δ, q 0, F ), where Q is the set of sttes, Σ is the input lphbet, q 0 is the initil stte, F is the set of finl sttes nd δ is fmily {δ } Σ, where

14 [b[cd]d] [b d] [cd] c b d d Exmple: (1). (verticl) string conctention (2). tree conctention (3). (horizontl) string substitution (see Exercise 2.30) (4). (2) is specil cse of (3) (see Exercise 2.31) 3 Recognizble tree lnguges 3.1 Finite tree utomt nd regulr tree grmmrs Let us first consider the usul finite utomton on strings. A deterministic finite utomton is structure M = (Q, Σ, δ, q 0, F ), where Q is the set of sttes, Σ is the input lphbet, q 0 is the initil stte, F is the set of finl sttes nd δ is fmily {δ } Σ, where δ : Q Q is the trnsition function for the input. There re severl wys to describe the functioning of M nd the lnguge it recognizes. One of them (see for instnce [Sl, I.4]), is to describe explicitly the sequence of steps tken by the utomton while processing some input string. This point of view will be considered in Prt (4). Another wy is to give recursive definition of the effect of n input string on the stte of M. Since recursive definition is in prticulr suitble for generliztion to trees, let us consider one in detil. We define function δ : Σ Q such tht, for w Σ, δ(w) is intuitively the stte M reches fter processing w, strting from the initil stte q 0 : (i) δ(λ) = q 0 (ii) for w Σ nd Σ, δ(w) = δ ( δ(w)). The lnguge recognized by M is L(M) = {w Σ δ(w) F }. When considering this definition of δ for bottom-up mondic trees (see Definition 2.21), one esily rrives t the following generliztion to the tree cse: There should be strt stte for ech element of Σ 0. The finite tree utomton strts t ll leves ( t the sme time, in prllel ) nd processes the tree in bottom-up fshion. The utomton rrives t ech node of rnk k with sequence of k sttes (one stte for ech direct subtree of the node), nd the trnsition function δ of the lbel of tht node is mpping δ : Q k Q, which, from tht sequence of k sttes, determines 10

15 the stte t tht node. A tree is recognized iff the tree utomton is in finl stte t the root of the tree. Formlly: Definition 3.1. A deterministic bottom-up finite tree utomton is structure M = (Q, Σ, δ, s, F ), where Q is finite set (of sttes), Σ is rnked lphbet (of input symbols), δ is fmily {δ} k k 1, Σk function for Σ k ), of mppings δ k : Q k Q (the trnsition s is fmily {s } Σ0 of sttes s Q (the initil stte for Σ 0 ), nd F is subset of Q (the set of finl sttes). The mpping δ : T Σ Q is defined recursively s follows: (i) for Σ 0, δ() = s, (ii) for k 1, Σ k nd t 1,..., t k T Σ, δ([t 1 t k ]) = δ k ( δ(t 1 ),..., δ(t k )). The tree lnguge recognized by M is defined to be L(M) = {t T Σ δ(t) F }. Intuitively, δ(t) is the stte reched by M fter bottom-up processing of t. For convenience, when k is understood, we shll write δ rther thn δ k. Note therefore tht ech symbol Σ my hve severl trnsition functions δ (one for ech of its rnks). We shll bbrevite finite tree utomton by ft, nd deterministic by det.. Definition 3.2. A tree lnguge L is clled recognizble (or regulr) if L = L(M) for some det. bottom-up ft M. The clss of recognizble tree lnguges will be denoted by RECOG. Exmple 3.3. Let us consider the det. bottom-up ft M = (Q, Σ, δ, s, F ), where Q = {0, 1, 2, 3}, Σ 0 = {0, 1, 2,..., 9}, Σ 2 = {+, }, s (mod 4), F = {1}, nd δ + nd δ (both mppings Q 2 Q) re ddition modulo 4 nd multipliction modulo 4 respectively. Then M recognizes the set of ll expressions whose vlue modulo 4 is 1. Consider for instnce the expression +[+[07] [2 [73]]], the prefix form of (0+7)+(2 (7 3)). In the following picture, + (1) + (3) (2) 0 (0) 7 (3) 2 (2) (1) 7 (3) 3 (3) the stte of M t ech node of the tree is indicted between prentheses. 11

16 Exmple 3.4. Let Σ 0 = {} nd Σ 2 = {b}. Consider the lnguge of ll trees in T Σ which hve right comb-like structure like for instnce the tree b[b[b[b[]]]]. This tree lnguge is recognized by the det. bottom-up ft M = (Q, Σ, δ, s, F ), where Q = {A, C, W }, s = A, F = {C} nd δ b is defined by δ b (A, A) = δ b (A, C) = C nd δ b (q 1, q 2 ) = W for ll other pirs of sttes (q 1, q 2 ). Exercise 3.5. Let Σ 0 = {, b}, Σ 1 = {p} nd Σ 2 = {p, q}. Construct det. bottom-up finite tree utomt recognizing the following tree lnguges: (i) the lnguge of ll trees t, such tht if node of t is lbeled q, then its descendnts re lbeled q or ; (ii) the set of ll trees t such tht yield(t) + b + ; (iii) the set of ll trees t such tht the totl number of p s occurring in t is odd. A (theoreticlly) convenient extension of the deterministic finite utomton is to mke it nondeterministic. A nondeterministic finite utomton (on strings) is structure M = (Q, Σ, δ, S, F ), where Q, Σ nd F re the sme s in the deterministic cse, S is set of initil sttes, nd, for ech Σ, δ is mpping Q P(Q) (intuitively, δ (q) is the set of sttes which M cn possibly, nondeterministiclly, enter when reding in stte q). Agin mpping δ, now from Σ into P(Q), cn be defined, such tht for every w Σ, δ(w) is the set of sttes M cn possibly rech fter processing w, hving strted from one of the initil sttes in S: (i) δ(λ) = S, (ii) for w Σ nd Σ, δ(w) = {δ (q) q δ(w)}. The lnguge recognized by M is L(M) = {w Σ δ(w) F }. Generlizing to trees we obtin the following definition. Definition 3.6. A nondeterministic bottom-up finite tree utomton is 5-tuple M = (Q, Σ, δ, S, F ), where Q, Σ nd F re s in the deterministic cse, S is fmily {S } Σ0 such tht S Q for ech Σ 0, nd δ is fmily {δ k } k 1, Σk of mppings δ k : Q k P(Q). The mpping δ : T Σ P(Q) is defined recursively by (i) for Σ 0, δ() = S, (ii) for k 1, Σ k nd t 1,..., t k T Σ, δ([t 1 t k ]) = {δ (q 1,..., q k ) q i δ(t i ) for 1 i k}. The tree lnguge recognized by M is L(M) = {t T Σ δ(t) F }. Note tht, for q Q k, δ k (q) my be empty. Exmple 3.7. Let Σ 0 = {p} nd Σ 2 = {, b}. Consider the following tree lnguge over Σ: L = {u 1 [[s 1 s 2 ][t 1 t 2 ]]u 2 T Σ } {u 1 b [b[s 1 s 2 ] b[t 1 t 2 ]]u 2 T Σ } 12

17 where stnds for u 1, u 2 (Σ {[, ]}), s 1, s 2, t 1, t 2 T Σ. In other words, L is the set of ll trees contining configurtion or configurtion (or both). L is recognized by the nondet. bottom-up ft M = (Q, Σ, δ, S, F ), where Q = {q s, q, q b, r}, S p = {q s }, F = {r} nd δ (q s, q s ) = {q s, q }, δ b (q s, q s ) = {q s, q b }, δ (q, q ) = δ b (q b, q b ) = {r}, for ll q Q : δ (q, r) = δ (r, q) = δ b (q, r) = δ b (r, q) = {r}, nd δ x (q 1, q 2 ) = for ll other possibilities. It is rther obvious in the lst exmple tht we cn find deterministic bottom-up ft recognizing the sme lnguge (find it!). We now show tht this is possible in generl (s in the cse of strings). Theorem 3.8. For ech nondeterministic bottom-up ft we cn find deterministic one recognizing the sme lnguge. Proof. The proof uses the subset-construction, well known from the string-cse. Let M = (Q, Σ, δ, S, F ) be nondeterministic bottom-up ft. Construct the deterministic bottom-up ft M 1 = (P(Q), Σ, δ 1, s 1, F 1 ) such tht (s 1 ) = S for ll Σ 0, F 1 = {Q 1 P(Q) Q 1 F }, nd, for Σ k nd Q 1,..., Q k Q, (δ 1 ) (Q 1,..., Q k ) = {δ (q 1,..., q k ) q i Q i for ll 1 i k}. It is strightforwrd to show, using Definitions 3.1 nd 3.6, tht δ 1 (t) = δ(t) for ll t T Σ (proof by induction on t). From this it follows tht L(M 1 ) = {t δ 1 (t) F 1 } = {t δ(t) F } = L(M). Exercise 3.9. Check the proof of Theorem 3.8. Construct the det. bottom-up ft corresponding to the ft M of Exmple 3.7 ccording to tht proof, nd compre this det. ft with the one you found before. Let us now consider the top-down generliztion of the finite utomton. Let M = (Q, Σ, δ, q 0, F ) be det. finite utomton. Another wy to define L(M) is by giving recursive definition of mpping δ : Σ P(Q) such tht intuitively, for ech w Σ, δ(w) is the set of sttes q such tht the mchine M, when strted in stte q, enters finl stte fter processing w. The definition of δ is s follows: (i) δ(λ) = F (ii) for w Σ nd Σ, δ(w) = {q δ (q) δ(w)} (the lst line my be red s: to check whether, strting in q, M recognizes w, compute q 1 = δ (q) nd check whether M recognizes w strting in q 1 ). The lnguge recognized by M is L(M) = {w Σ q 0 δ(w)}. This definition, pplied to top-down mondic trees, leds to the following generliztion to rbitrry trees. The finite tree utomton strts t the root of the tree in the initil stte, nd processes the tree in top-down b b b 13

18 fshion. The utomton rrives t ech node in one stte, nd the trnsition function δ of the lbel of tht node is mpping δ : Q Q k (where k is the rnk of the node), which, from tht stte, determines the stte in which to continue for ech direct descendnt of the node (the utomton splits up into k independent copies, one for ech direct subtree of the node). Finlly the utomton rrives t ll leves of the tree. There should be set of finl sttes for ech element of Σ 0. The tree is recognized if the ft rrives t ech lef in stte which is finl for the lbel of tht lef. Formlly: Definition A deterministic top-down finite tree utomton is 5-tuple M = (Q, Σ, δ, q 0, F ), where Q is finite set (of sttes), Σ is rnked lphbet (of input symbols), δ is fmily {δ} k k 1, Σk function for Σ k ), of mppings δ k : Q Q k (the trnsition q 0 is in Q (the initil stte), nd F is fmily {F } Σ0 of sets F Q (the set of finl sttes for Σ 0 ). The mpping δ : T Σ P(Q) is defined recursively by (i) for Σ 0, δ() = F (ii) for k 1, Σ k nd t 1,..., t k T Σ, δ([t 1 t k ]) = {q δ (q) δ(t 1 ) δ(t k )}. The tree lnguge recognized by M is defined to be L(M) = {t T Σ q 0 δ(t)}. Intuitively, δ(t) is the set of sttes q such tht M, when strting t the root of t in stte q, rrives t the leves of t in finl sttes. Exmple Consider the tree lnguge of Exercise 3.5(i). A det. top-down ft recognizing this lnguge is M = (Q, Σ, δ, q 0, F ) where Q = {A, R, W }, q 0 = A, F = {A, R}, F b = {A} nd δ 1 p(a) = A, δ 1 p(r) = δ 1 p(w ) = W, δ 2 p(a) = (A, A), δ 2 p(r) = δ 2 p(w ) = (W, W ), δ q (A) = (R, R), δ q (R) = (R, R), δ q (W ) = (W, W ). Exercise Let Σ be rnked lphbet, nd p Σ 2. Let L be the tree lnguge defined recursively by (i) for ll t 1, t 2 T Σ, p[t 1 t 2 ] L (ii) for ll Σ k, if t 1,..., t k L, then [t 1 t k ] L (k 1). Construct deterministic top-down ft recognizing L. Give nonrecursive description of L. Exercise Construct det. top-down ft M such tht yield(l(m)) = + b +. We now show tht the det. top-down ft recognizes less lnguges thn its bottom-up counterprt. 14

19 Theorem There re recognizble tree lnguges which cnnot be recognized by deterministic top-down ft. Proof. Let Σ 0 = {, b} nd Σ 2 = {S}. Consider the (finite!) tree lnguge L = {S[b], S[b]}. Suppose tht the det. top-down ft M = (Q, Σ, δ, q 0, F ) recognizes L. Let δ S (q 0 ) = (q 1, q 2 ). Since S[b] L(M), q 1 F nd q 2 F b. But, since S[b] L(M), q 1 F b nd q 2 F. Hence both S[] nd S[bb] re in L(M). Contrdiction. Exercise Show tht the tree lnguges of Exercise 3.5(ii,iii) re not recognizble by det. top-down ft. It will be cler tht the nondeterministic top-down ft is ble to recognize ll recognizble lnguges. We give the definition without comment. Definition A nondeterministic top-down finite tree utomton is structure M = (Q, Σ, δ, S, F ), where Q, Σ nd F re s in the deterministic cse, S is subset of Q nd δ is fmily {δ k } k 1, Σk of mppings δ k : Q P(Q k ). The mpping δ : T Σ P(Q) is defined recursively s follows (i) for Σ 0, δ() = F, (ii) for k 1, Σ k nd t 1,..., t k T Σ, δ([t 1 t k ]) = {q (q 1,..., q k ) δ (q) : q i δ(t i ) for ll 1 i k}. The tree lnguge recognized by M is L(M) = {t T Σ δ(t) S }. We now show tht, nondeterministiclly, there is no difference between bottom-up or top-down recognition. Theorem A tree lnguge is recognizble by nondet. bottom-up ft iff it is recognizble by nondet. top-down ft. Proof. Let us sy tht nondet. bottom-up ft M = (Q, Σ, δ, S, F ) nd nondet. topdown ft N = (P,, µ, R, G) re ssocited if the following requirements re stisfied: (i) Q = P, Σ =, F = R nd, for ll Σ 0, S = G ; (ii) for ll k 1, Σ k nd q 1,..., q k, q Q, q δ (q 1,..., q k ) iff (q 1,..., q k ) µ (q). In tht cse, one cn esily prove by induction tht δ = µ, nd so L(M) = L(N). Since obviously for ech nondet. bottom-up ft there is n ssocited nondet. top-down ft, nd vice vers, the theorem holds. Thus the clsses of tree lnguges recognized by the nondet. bottom-up, det. bottom-up nd nondet. top-down ft re ll equl (nd re clled RECOG), wheres the clss of tree lnguges recognized by the det. top-down ft is proper subclss of RECOG. The next victim of generliztion is the regulr grmmr (right-liner, type-3 grmmr). In this cse it seems pproprite to tke the top-down point of view only. Consider n 15

20 ordinry regulr grmmr G = (N, Σ, R, S). All rules hve either the form A wb or the form A w, where A, B N nd w Σ. Mondiclly, the string wb my be considered s the result of treeconctenting the tree we with B t e, where B is of rnk 0. Thus we cn tke the generliztion of strings of the form wb or w to be trees in T (N), where is rnked lphbet (for the definition of T (N), see Definition 2.33). Thus, let us consider tree grmmr with rules of the form A t, where A N nd t T (N). Obviously, the ppliction of rule A t to tree s T (N) should intuitively consist of replcing one occurrence of A in s by the tree t. Strting with the initil nonterminl, nonterminls t the frontier of the tree re then repetedly replced by right hnd sides of rules, until the tree does not contin nonterminls ny more. Now, since trees re defined s strings, it turns out tht this process is precisely the wy context-free grmmr works. Thus we rrive t the following forml definition. Definition A regulr tree grmmr is tuple G = (N, Σ, R, S) where N is finite set (of nonterminls), Σ is rnked lphbet (of terminls), such tht Σ N =, S N is the initil nonterminl, nd R is finite set of rules of the form A t with A N nd t T Σ (N). The tree lnguge generted by G, denoted by L(G), is defined to be L(H), where H is the context-free grmmr (N, Σ {[, ]}, R, S). We shll use = G nd = G (or nd = when G is understood) to denote the restrictions of = H nd = H to T Σ (N). Exmple Let Σ 0 = {, b, c, d, e}, Σ 2 = {p} nd Σ 3 = {p, q}. Consider the regulr tree grmmr G = (N, Σ, R, S), where N = {S, T } nd R consists of the rules S p[t ], T q[cp[dt ]b] nd T e. Then G genertes the tree p[q[cp[de]b]] s follows: or, pictorilly, S p[t ] p[q[cp[dt ]b]] p[q[cp[de]b]] S p T c d p q p T b c d p q p e b. The tree lnguge generted by G is {p[(q[cp[d) n e(]b]) n ] n 0}. Exercise Write regulr tree grmmrs generting the tree lnguges of Exercise 3.5. As in the cse of strings, ech regulr tree grmmr is equivlent to one tht hs the property tht t ech step in the derivtion exctly one terminl symbol is produced. Definition A regulr tree grmmr G = (N, Σ, R, S) is in norml form, if ech of its rules is either of the form A [B 1 B k ] or of the form A b, where k 1, Σ k, A, B 1,..., B k N nd b Σ 0. 16

21 Theorem Ech regulr tree grmmr hs n equivlent regulr tree grmmr in norml form. Proof. Consider n rbitrry regulr tree grmmr G = (N, Σ, R, S). Let G 1 = (N, Σ, R 1, S) be the regulr tree grmmr such tht (A t) R 1 if nd only if t / N nd there is B in N such tht A = B nd (B t) R 1. Then L(G 1 ) = L(G), G nd R 1 does not contin rules of the form A B with A, B N. (This is the well-known procedure of removing rules A B from context-free grmmr). Suppose tht G 1 is not yet in norml form. Thn there is rule of the form A [t 1 t i t k ] such tht t i / N. Construct new regulr tree grmmr G 2 by dding new nonterminl B to N nd replcing the rule A [t 1 t i t k ] by the two rules A [t 1 B t k ] nd B t i in R 1. It should be cler tht L(G 2 ) = L(G 1 ), nd tht, by repeting the ltter process finite number of times, one ends up with n equivlent grmmr in norml form. Exercise Put the regulr tree grmmr of Exmple 3.19 into norml form. Exercise Wht does Theorem 3.22 ctully sy in the cse of strings (the mondic cse)? In the next theorem we show tht the regulr tree grmmrs generte exctly the clss of recognizble tree lnguges. Theorem A tree lnguge cn be generted by regulr tree grmmr iff it is n element of RECOG. Proof. Exercise. Note therefore tht ech recognizble tree lnguge is specil kind of context-free lnguge. Exercise Show tht ll finite tree lnguges re in RECOG. Exercise Show tht ech recognizble tree lnguge cn be generted by bckwrds deterministic regulr tree grmmr. A regulr tree grmmr is clled bckwrds deterministic if (1) it my hve more thn one initil nonterminl, (2) it is in norml form, nd (3) rules with the sme right hnd side re equl. It is now esy to show the connection between recognizble tree lnguges nd contextfree lnguges. Let CFL denote the clss of context-free lnguges. Theorem yield(recog) = CFL (in words, the yield of ech recognizble tree lnguge is context-free, nd ech context-free lnguge is the yield of some recognizble tree lnguge). 17

22 Proof. Let G = (N, Σ, R, S) be regulr tree grmmr. Consider the context-free grmmr G = (N, Σ 0, R, S), where R = {A yield(t) A t R}. Then L(G) = yield(l(g)). Now let G = (N, Σ, R, S) be context-free grmmr. Let be new symbol, nd let = Σ {e, } be the rnked lphbet such tht 0 = Σ {e}, nd, for k 1, k = { } if nd only if there is rule in R with right hnd side of length k. Consider the regulr tree grmmr G = (N,, R, S) such tht (i) if A w is in R, w λ, then A [w] is in R, (ii) if A λ is in R, then A e is in R. Then yield(l(g)) = L(G). In the next section we shll give the connection between regulr tree lnguges nd derivtion trees of context-free lnguges. Exercise A context-free grmmr is invertible if rules with the sme right hnd side re equl. Show tht ech context-free lnguge cn be generted by n invertible context-free grmmr. For regulr string lnguges useful stronger version of Theorem 3.28 cn be proved. Theorem Let Σ be rnked lphbet. If R is regulr string lnguge over Σ 0, then the tree lnguge {t T Σ yield(t) R} is recognizble. Proof. Let M = (Q, Σ, δ, q 0, F ) be deterministic finite utomton recognizing R. We construct nondeterministic bottom-up ft N = (Q Q, Σ, µ, S, G), which, for ech tree t, checks whether successful computtion of M on yield(t) is possible. The sttes of N re pirs of sttes of M. Intuitively we wnt tht (q 1, q 2 ) µ(t) if nd only if M rrives in stte q 2 fter processing yield(t), strting from stte q 1. Thus we define (i) for ll Σ 0, S = {(q 1, q 2 ) δ (q 1 ) = q 2 }, (ii) for ll k 1, Σ k nd sttes q 1, q 2,..., q 2k Q, {(q 1, q 2k )} if q 2i = q 2i+1 for µ ((q 1, q 2 ), (q 3, q 4 ),..., (q 2k 1, q 2k )) = ll 1 i k 1 otherwise. Then L(N) = {t T Σ yield(t) R}. Exercise Show tht, if Σ 2, then Theorem 3.30 holds conversely: if L is string lnguge such tht {t T Σ yield(t) L} is recognizble, then L is regulr. Wht cn you sy in cse Σ 2 =? 3.2 Closure properties of recognizble tree lnguges We first consider set-theoretic opertions. 18

23 Theorem RECOG is closed under union, intersection nd complementtion. Proof. To show closure under complementtion, consider deterministic bottom-up ft M = (Q, Σ, δ, s, F ). Let N be the det. bottom-up ft (Q, Σ, δ, s, Q F ). Then, obviously, L(N) = T Σ L(M). To show closure under union, consider two regulr tree grmmrs G i = (N i, Σ i, R i, S i ), i = 1, 2 (with N 1 N 2 = ). Then G = (N 1 N 2 {S}, Σ 1 Σ 2, R 1 R 2 {S S 1, S S 2 }, S) is regulr tree grmmr such tht L(G) = L(G 1 ) L(G 2 ). As corollry we obtin the following closure property of context-free lnguges. Corollry CFL is closed under intersection with regulr lnguges. Proof. Let L nd R be context-free nd regulr lnguge respectively. According to Theorem 3.28, there is recognizble tree lnguge U such tht yield(u) = L. Consequently, by Theorems 3.30 nd 3.32, the tree lnguge V = U {t yield(t) R} is recognizble. Obviously L R = yield(v ) nd so, gin by Theorem 3.28, L R is context-free. We now turn to the closure of RECOG under conctention opertions (see Definitions 2.23 nd 2.27). Theorem For every k 1 nd Σ k, RECOG is closed under tc k. Proof. Exercise. Theorem RECOG is closed under tree conctention. Proof. The proof is obtined by generlizing tht for regulr string lnguges. Let n 1, 1,..., n Σ 0 ll different nd L 0, L 1,..., L n recognizble tree lnguges (we my ssume tht ll lnguges re over the sme rnked lphbet Σ). Let G i = (N i, Σ, R i, S i ) be regulr tree grmmr in norml form for L i (i = 0, 1,..., n). A regulr tree grmmr generting L 0 1 L 1,..., n L n is G = ( n N i, Σ, R, S 0 ), where R = R 0 n R i, nd R 0 is R 0 with ech rule of the form A i replced by the rule A S i (1 i n). i=0 Corollry CFL is closed under substitution. i=1 Proof. Use Theorem 3.28 nd Exercise Note lso tht Theorem 3.35 is essentilly specil cse of Corollry Next we generlize the notion of (conctention) closure of string lnguges to trees, nd show tht RECOG is closed under this closure opertion. We shll, for convenience, restrict ourselves to the cse tht tree conctention hppens t one element of Σ 0. 19

24 Definition Let Σ 0 nd let L be tree lnguge over Σ. Then the tree conctention closure of L t, denoted by L, is defined to be X n, where X 0 = {} nd, for n 0, X n+1 = X n (L {}). Exmple Let G = (N, Σ, R, S) be the regulr tree grmmr with N = {S}, Σ 0 = {}, Σ 2 = {b} nd R = {S b[s], S }. Then L(G) = {b[s]} S S. The corresponding opertion on strings hs severl nmes in the literture. Let us cll it substitution closure. Definition Let be n lphbet nd. For lnguge L over, the substitution closure of L t, denoted by L, is defined to be X n, where X 0 = {} nd, for n 0, X n+1 = X n (L {}). Exercise Let Σ 0, e, nd let L T Σ. Prove tht yield(l ) = (yield(l)). Theorem RECOG is closed under tree conctention closure. Proof. Agin the proof is strightforwrd generliztion of the string cse. Let G = (N, Σ, R, S) be regulr tree grmmr in norml form, nd let Σ 0. Construct the regulr tree grmmr G = (N {S 0 }, Σ, R, S 0 ), where R = R {A S A is in R} {S 0 S, S 0 }. Then L(G) = (L(G)). Corollry CFL is closed under substitution closure. n=0 n=0 Proof. Use Theorem 3.28 nd Exercise It is well known tht the clss of regulr string lnguges is the smllest clss contining the finite lnguges nd closed under union, conctention nd closure. A similr result holds for recognizble tree lnguges. Theorem RECOG is the smllest clss of tree lnguges contining the finite tree lnguges nd closed under union, tree conctention nd tree conctention closure. Proof. We hve shown tht RECOG stisfies the bove conditions in Exercise 3.26 nd Theorems 3.32, 3.35 nd It remins to show tht every recognizble tree lnguge cn be built up from the finite tree lnguges using the opertions, nd. Let G = (N, Σ, R, S) be regulr tree grmmr (it is esy to think of it s being in norml form). We shll use the elements of N to do tree conctention t. For A N nd P, Q N with P Q =, let us denote by L Q A,P the set of ll trees t T Σ(P ) for which there is derivtion A t 1 t 2 t n t n+1 = t (n 0) such tht, for 1 i n, t i T Σ (Q P ) nd rule with left hnd side in Q is pplied to t i to obtin Recll the nottion L 1 L 2 from Definition

25 t i+1. We shll show, by induction on the crdinlity of Q, tht ll sets L Q A,P cn be built up from the finite tree lnguges by the opertions, B nd B (for ll B N). For Q =, L A,P is the set of ll those right hnd sides of rules with left hnd side A, tht re in T Σ (P ). Thus L A,P is finite tree lnguge for ll A nd P. Assuming now tht, for Q N, ll sets L Q A,P cn be built up from the finite tree lnguges, the sme holds for ll sets L Q {B} A,P, where B N Q, since L Q {B} A,P = L Q A,P {B} B (L Q B,P {B} ) B B L Q B,P ( forml proof of this eqution is left to the reder). Thus, since L(G) = L N S,, the theorem is proved. In other words, ech recognizble tree lnguge cn be denoted by regulr expression with trees s constnts nd, A nd A s opertors. Exercise Try to find regulr expression for the lnguge generted by the regulr tree grmmr G = (N, Σ, R, S) with N = {S, T }, Σ 0 = {}, Σ 2 = {p} nd R = {S p[t S], S, T p[t T ], T }. Use the lgorithm in the proof of Theorem As corollry we obtin the result tht ll context-free lnguges cn be denoted by context-free expressions. Corollry CFL is the smllest clss of lnguges contining the finite lnguges nd closed under union, substitution nd substitution closure. Proof. Exercise. Exercise Define the opertion of iterted conctention t (for tree lnguges) nd iterted substitution t (for string lnguges) by it (L) = L. Prove (using Theorem 3.43) tht RECOG is the smllest clss of tree lnguges contining the finite tree lnguges nd closed under the opertions of union, top conctention nd iterted conctention. Show tht this implies tht CFL is the smllest clss of lnguges contining the finite lnguges nd closed under the opertions of union, conctention nd iterted substitution (cf. [Sl, VI.11]). Let us now turn to nother opertion on trees: tht of relbeling the nodes of tree. Definition Let Σ nd be rnked lphbets. A relbeling r is fmily {r k } k 0 of mppings r k : Σ k P( k ). A relbeling determines mpping r : T Σ P(T ) by the requirements (i) for Σ 0, r() = r 0 (), (ii) for k 1, Σ k nd t 1,..., t k T Σ, r([t 1 t k ]) = {b[s 1 s k ] b r k () nd s i r(t i )}. 21

26 If, for ech k 0 nd ech Σ k, r k () consists of one element only, then r is clled projection. Obviously, RECOG is closed under relbelings. Theorem RECOG is closed under relbelings. Proof. Let r be relbeling, nd consider some regulr tree grmmr G. By replcing ech rule A t of G by ll rules A s, s r(t), one obtins regulr tree grmmr for r(l(g)). (In order tht r(t) mkes sense, we define r(b) = {B} for ech nonterminl B of G). We re now in position to study the connection between recognizble tree lnguges nd sets of derivtion trees of context-free grmmrs. We shll consider two kinds of derivtion trees. First we define the ordinry kind of derivtion tree (cf. Exmple 1.1). Definition Let G = (N, Σ, R, S) be context-free grmmr. Let be the rnked lphbet such tht 0 = Σ {e} nd, for k 1, k is the set of nonterminls A N for which there is rule A w with w = k (in cse k = 1 : w = 1 or w = 0). For ech α N Σ, the set of derivtion trees with top α, denoted by DG α, is the tree lnguge over defined recursively s follows (i) for ech in Σ, D G ; (ii) for ech rule A α 1 α n in R (n 1, A N, α i Σ N), if t i D α i G for 1 i n, then A[t 1 t n ] D A G ; (iii) for ech rule A λ in R, A[e] D A G. Definition A tree lnguge L is sid to be locl if, for some context-free grmmr G = (N, Σ, R, S) nd some set of symbols V N Σ, L = DG α. Exercise Show tht ech locl tree lnguge is recognizble. Note tht locl tree lnguge is the set of ll derivtion trees of context-free grmmr which hs set of initil symbols (insted of one initil nonterminl). The reson for the nme locl is tht such tree lnguge L is determined by (1) finite set of trees of height one, (2) finite set of initil symbols, (3) finite set of finl symbols, nd the requirement tht L consists of ll trees t such tht ech node of t together with its direct descendnts belongs to (1), the top lbel of t belongs to (2), nd the lef lbels of t to (3). We now show tht the clss of locl tree lnguges is properly included in RECOG. Theorem There re recognizble tree lnguges which re not locl. Proof. Let Σ 0 = {, b} nd Σ 2 = {S}. Consider the tree lnguge L = {S[S[b]S[b]]}. Obviously L is recognizble. Suppose tht L is locl. Then there is context-free α V 22

27 grmmr G such tht DG S = L. Thus S SS, S b nd S b re rules of G. But then S[S[b]S[b]] L. Contrdiction. Note tht the recognizble tree lnguge L in the bove proof cn be recognized by deterministic top-down ft. Note lso tht the tree lnguge given in the proof of Theorem 3.14 is locl. Hence the locl tree lnguges nd the tree lnguges recognized by det. top-down ft re incomprble. Exercise Find recognizble tree lnguge which is neither locl nor recognizble by det. top-down ft. It is cler tht, if Σ 0 = {, b} nd Σ 2 = {S 1, S 2, S 3 }, then L = {S 1 [S 2 [b]s 3 [b]]} is locl lnguge. Hence the lnguge L in Theorem 3.52 is the projection of the locl lnguge L (project S 1, S 2 nd S 3 on S). We will show tht this is true in generl: ech recognizble tree lnguge is the projection of locl tree lnguge. In fct we shll show slightly stronger fct. To do this we define the second type of derivtion tree of context-free grmmr, clled rule tree. Definition Let G = (N, Σ, R, S) be context-free grmmr. Let R be ny set of symbols in one-to-one correspondence with R, R = {r r R}. Ech element of R is given rnk such tht, if r in R is of the form A w 0 A 1 w 1 A 2 w 2 A k w k (for some k 0, A 1,..., A k N nd w 0, w 1,..., w k Σ ), then r R k. The set of rule trees of G, denoted by RT (G), is defined to be the tree lnguge generted by the regulr tree grmmr G = (N, R, P, S), where P is defined by (i) if r = (A w 0 A 1 w k 1 A k w k ), k 1, is in R, then A r[a 1 A k ] is in P ; (ii) if r = (A w 0 ) is in R, then A r is in P. Definition We shll sy tht tree lnguge L is rule tree lnguge if L = RT (G) for some context-free grmmr G. Thus, rule tree is derivtion tree in which the nodes re lbeled by the rules pplied during the derivtion. It should be obvious, tht for ech context-free grmmr G = (N, Σ, R, S) there is one-to-one correspondence between the tree lnguges RT (G) nd D S G. Exmple Consider Exmple 1.1. For ech rule r in tht exmple, let (r) stnd for new symbol. The rule tree corresponding to the derivtion tree displyed in Exmple 1.1 is Other exmples re for instnce {S[T []T [b]]} nd {S[S[]]}. 23

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38 Theory of Computtion Regulr Lnguges (NTU EE) Regulr Lnguges Fll 2017 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of Finite Automt A finite utomton hs finite set of control