Composition and Decomposition of Extended Multi Bottom-Up Tree Transducers?

Size: px

Start display at page:

Download "Composition and Decomposition of Extended Multi Bottom-Up Tree Transducers?"

Silvia Owen
5 years ago
Views:

1 Composition and Decomposition of Extended Multi Bottom-Up Tree Transducers? Joost Engelfriet 1, Eric Lilin 2, and Andreas Maletti 3;?? 1 Leiden Institute of Advanced Computer Science Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands engelfri@liacs.nl 2 Universite des Sciences et Technologies de Lille UFR IEEA 59655, Villeneuve d'asc, France eric.lilin@lifl.fr 3 International Computer Science Institute 1947 Center Street, Suite 600, Berkeley, CA 94704, USA maletti@icsi.berkeley.edu Abstract. Extended multi bottom-up tree transducers are dened and investigated. They are an extension of multi bottom-up tree transducers by arbitrary, not just shallow, left-hand sides of rules; this includes rules that do not consume input. It is shown that such transducers can compute all transformations that are computed by linear extended top-down tree transducers (which are a theoretical model for syntax-based machine translation). Moreover, the classical composition results for bottom-up tree transducers are generalized to extended multi bottom-up tree transducers. Finally, a characterization in terms of extended top-down tree transducers is presented. 1 Introduction The area of machine translation recently embraced syntactical translation models [23], which rely on the syntactical structure of a sentence to perform the translation. The new subeld syntax-based machine translation [7] was successfully established, and the rst results look very promising. However, there are two major problems: (i) The syntax-based approach is computationally much more expensive (up to the point of computational infeasibility) than the classical phrase-based approach and (ii) there currently is a wealth of formal models that compete to become the implementation standard that nite-state transducers [21] became for phrase-based machine translation. In syntax-based machine translation, the translation of an input sentence uses not only the words of the sentence and the context, in which they appear,? This is an extended and revised version of: J. Engelfriet, E. Lilin, A. Maletti. Extended Multi Bottom-up Tree Transducers. Proc. 12th Int. Conf. Developments in Language Theory. LNCS 5257, 289{300, Springer-Verlag 2008.?? Author was supported by a fellowship within the Postdoc-Programme of the German Academic Exchange Service (DAAD).

2 Table 1. Overview of formal models with respect to desired criteria. \x" marks fullment; \{" marks failure to full. A uestion mark shows that this remains open though we conjecture fullment. Model n Criterion (a) (b) (c) (d) Linear nondeleting top-down tree transducer [32, 38] { x { x Quasi-alphabetic tree bimorphism [37] {? { x Synchronous context-free grammar [1] x x { x Synchronous tree substitution grammar [34] x x x { Synchronous tree adjoining grammar [36, 35, 5] x x x { Linear complete tree bimorphism [5] x x x { Linear nondeleting extended top-down tree transducer [3, 4, 20, 29] x x x { Linear multi bottom-up tree transducer [13, 14, 28] {? x x Linear extended multi bottom-up tree transducer [this paper] x? x x but rather also uses the syntactic structure of the sentence. Thus, the input sentence is rst parsed and the transformation is then started on the syntax trees so obtained. We can distinguish two modes of translation: tree-to-string and tree-to-tree. In the former mode, the translation device immediately generates sentences of the output language whereas in the latter mode the translation device translates syntax trees of an input sentence into syntax trees of output sentences. In this contribution we will focus on the tree-to-tree mode; in fact, most tree-to-tree devices can be turned into tree-to-string devices by only considering the yield (the seuence of nullary output symbols) of the output they generate. Knight [22, 23] proposed the following criteria that any reasonable formal tree-to-tree model of syntax-based machine translation should full: (a) It should be a genuine generalization of nite-state transducers [21]; this includes the use of epsilon rules, i.e., rules that do not consume any part of the input tree. (b) It should be eciently trainable. (c) It should be able to handle rotations (on the tree level). (d) Its induced class of transformations should be closed under composition. Graehl and Knight [20] proposed the linear and nondeleting extended (topdown) tree transducer (ln-xtt) [3, 4] as a suitable formal model. It fulls (a){(c) but fails to full (d); see [28, Corollary 5] and [29, Lemma 5.2] for the failure of (d). Further models were proposed but, to the authors' knowledge, they all fail at least one criterion. Table 1 shows some important models and their properties. Let us explain why the closure under composition is important. First, it is infeasible to train one transducer that handles the complete machine translation task. Instead, \small" task-specic transducers (like transducers that reorder the syntax tree, insert output subtrees that have no counterpart in the input syntax tree, or do word-by-word translation [41]) are trained. Once those \small" transducers are trained, we would like to compose them into a single transducer, so that further operations (like composing with a language model transducer, 2

3 obtaining a regular tree grammar [16, 17] for the output syntax trees generated for a specic input syntax tree, etc.) can be applied to a single transducer and not a chain of transducers. Second, the success of nite-state transducer toolkits (like Carmel [19] and OpenFsm [2]) is largely due to the fact that they rely on only one model. This simplies the usage of the toolkits and allows rapid development of translation systems. We propose a formal model that satises criteria (a), (c), and (d), and has more expressive power than the ln-xtt. The device is called linear extended multi bottom-up tree transducer, and it is as powerful as the linear model of [13, 14, 28] enhanced by epsilon rules (as shown in Theorem 5). In this paper we formally dene and investigate the extended multi bottom-up tree transducer (xmbutt) and various restrictions (e.g., linear, nondeleting, and deterministic). Note that we consider the xmbutt in general, not just its linear restriction. We start with normal forms for xmbutts. First, we construct for every xmbutt an euivalent nondeleting xmbutt (see Theorem 3). This can be achieved by guessing the reuired translations. Though the construction preserves linearity, it obviously destroys determinism. Next, we present a one-symbol normal form for xmbutts (see Theorem 5): each rule of the xmbutt either consumes one input symbol (without producing output), or produces one output symbol (without consuming input). This normal form preserves all three restrictions above. Our main result (Theorem 15) states that the class of transformations computed by xmbutts is closed under pre-composition with transformations computed by linear xmbutts and under post-composition with those computed by deterministic xmbutts. In particular, we also obtain that the classes of transformations computed by linear and/or deterministic xmbutts are closed under composition. These results are analogous to classical results (see [8, Theorems 4.5 and 4.6] and [6, Corollary 7]) for bottom-up tree transducers [39, 8] and thus show the \bottom-up" nature of xmbutts. Also, they are analogous to the composition results of [28, Theorem 11] for (non-extended) multi bottom-up tree transducers. As in [28], our proof essentially uses the principle set forth in [6, Theorem 6], but the one-symbol normal form allows us to present a very simple composition construction for xmbutts and verify that it is correct, provided that the rst input transducer is linear or the second is deterministic. We observe here that the \extension" of a tree transducer model (or even just the addition of epsilon rules) can, in general, destroy closure under composition, as can be seen from the linear and nondeleting top-down tree transducer. This seems to be due to the non-existence of a one-symbol normal form in the top-down case. We also verify that linear xmbutts have sucient power for syntax-based machine translation. This is because, as mentioned before (and shown in Theorem 10), they can simulate all ln-xtts. Thus, we have a lower bound to the power of linear xmbutts. In fact, even the composition closure of the class of transformations computed by ln-xtts is strictly contained in the class of transformations computed by linear xmbutts. Finally, we also present exact characterizations (Theorems 8 and 16): xmbutts are as powerful as compositions of an ln-xtt with a deterministic top-down tree transducer. In the linear case the latter transducer 3

4 has the so-called single-use property [15, 18, 24, 25, 11], and similar results hold in the deterministic case. Thus, the composition of two extended top-down tree transducers forms an upper bound to the power of the linear xmbutt. We summarize the obtained results in an inclusion diagram (see Fig. 4). As a side-result we obtain that linear xmbutts admit a semantics based on recognizable rule tree languages. This suggests that linear xmbutts also satisfy criterion (b). 2 Preliminaries Let A; B; C be sets. A relation from A to B is a subset of A B. Let 1 A B and 2 B C. The composition of 1 and 2 is 1 ; 2 = f(a; c) j 9b 2 B : (a; b) 2 1 ; (b; c) 2 2 g : This composition is lifted to classes of relations in the usual manner. The nonnegative integers are denoted by N and fi j 1 i kg is denoted by [k]. A ranked set is a set of symbols with a relation rk N such that fk j (; k) 2 rkg is nite for every 2. Commonly, we denote the ranked set only by and the set of k-ary symbols of by (k) = f 2 j (; k) 2 rkg. We also denote that 2 (k) by writing (k). Given two ranked sets and with associated rank relations rk and rk, respectively, the set [ is associated the rank relation rk [ rk. A ranked set is uniuely-ranked if for every 2 there exists exactly one k such that (; k) 2 rk. For uniuely-ranked sets, we denote this k simply by rk(). An alphabet is a nite set, and a ranked alphabet is a ranked set such that is an alphabet. Let be a ranked set. The set of -trees, denoted by T, is the smallest set T such that (t 1 ; : : : ; t k ) 2 T for every k 2 N, 2 (k), and t 1 ; : : : ; t k 2 T. We write instead of () if 2 (0). Let and H T. By (H) we denote f(t 1 ; : : : ; t k ) j 2 (k) ; t 1 ; : : : ; t k 2 Hg. Now, let be a ranked set. We denote by T (H) the smallest set T T [ such that H T and (T ) T. Let t 2 T. The set of positions of t, denoted by pos(t), is dened by pos((t 1 ; : : : ; t k )) = f"g [ fiw j i 2 [k]; w 2 pos(t i )g for every 2 (k) and t 1 ; : : : ; t k 2 T. Note that we denote the empty string by " and that pos(t) N. Let w 2 pos(t) and u 2 T. The subtree of t that is rooted in w is denoted by tj w, the symbol of t at w is denoted by t(w), and the tree obtained from t by replacing the subtree rooted at w by u is denoted by t[u] w. For every and 2, let pos (t) = fw 2 pos(t) j t(w) 2 g and pos (t) = pos fg (t). Let X = fx i j i 1g be a set of formal variables, each variable is considered to have the uniue rank 0. In what follows, we assume that the input, output, and state alphabets of tree transducers do not contain variables. A tree t 2 T (X) is linear (respectively, nondeleting) in V X if card(pos v (t)) 1 (respectively, card(pos v (t)) 1) for every v 2 V. The set of variables of t is var(t) = fv 2 X j pos v (t) 6= ;g and the seuence of variables is given by yield X : T (X)! X with yield X (v) = v for every v 2 X and yield X ((t 1 ; : : : ; t k )) = yield X (t 1 ) yield X (t k ) 4

5 for every 2 (k) and t 1 ; : : : ; t k 2 T (X). A tree t 2 T (X) is normalized if yield X (t) = x 1 x m for some m 2 N. Every mapping : X! T (X) is a substitution. We dene the application of to a tree in T (X) inductively by v = (v) for every v 2 X and (t 1 ; : : : ; t k ) = (t 1 ; : : : ; t k ) for every 2 (k) and t 1 ; : : : ; t k 2 T (X). An extended (top-down) tree transducer (xtt, or transducteur generalise descendant) [3, 4] is a tuple M = (Q; ; ; I; R) where Q is a uniuely-ranked alphabet such that Q = Q (1), and are ranked alphabets, I Q, and R is a nite set of rules of the form l! r with l 2 Q(T (X)) linear in X, and r 2 T (Q(var(l))). The xtt M is linear (respectively, nondeleting) if r is linear (respectively, nondeleting) in var(l) for every l! r 2 R. It is epsilonfree, if l =2 Q(X) for every l! r 2 R. The semantics of the xtt is given by term rewriting. Let ; 2 T (Q(T )). We write ) M if there exist a rule l! r 2 R, a position w 2 pos(), and a substitution : X! T such that j w = l and = [r] w. The tree transformation computed by M is the relation M = f(t; u) 2 T T j 9 2 I : (t) ) M ug. In Fig. 1 we display two example rules and a derivation step to illustrate the above denitions. The class of all tree transformations computed by xtts is denoted by XTOP. The prexes `l', `n', and `e' are used to restrict to linear, nondeleting, and epsilon-free devices, respectively. Thus, ln-xtop denotes the class of all tree transformations computable by linear and nondeleting xtts. x 1 x 2 x 3! x 1! x 1 x 1 p x 2 x 3 u t 1 t 2 t 3 )M t 1 u p t 2 t 3 Fig. 1. Two example rules of an xtt M and a derivation step using the rst rule. An xtt M = (Q; ; ; I; R) is a top-down tree transducer [32, 38] if for every rule l! r 2 R there exist 2 Q and 2 (k) such that l = ((x 1 ; : : : ; x k )). The top-down tree transducer M is deterministic if (i) card(i) = 1 and (ii) for every l 2 Q( (X)) there exists at most one r such that l! r 2 R. Finally, M is single-use [15, 18, 24, 25] if for every (v) 2 Q(X), k 2 N, and 2 (k) there exist at most one l! r 2 R and w 2 pos(r) such that l(1) = and rj w = (v). We use TOP and TOP su to denote the classes of transformations computed by top-down tree transducers and single-use top-down tree transducers, respectively. We also use the prexes `l', `n', and `d' to restrict to linear, nondeleting, and deterministic devices, respectively. 5

6 We use the standard notion of a recognizable (or regular) tree language [16, 17], and we denote the class of all recognizable tree languages by REC. Moreover, we let Rec( ) = fl T j L 2 RECg. For a class T of tree transformations, T (REC) = f(l) j 2 T and L 2 RECg. Finally, we recall top-down tree transducers with regular look-ahead [9]. A top-down tree transducer with regular look-ahead is a pair hm; ci such that M = (Q; ; ; I; R) is a top-down tree transducer and c: R! Rec( ). We say that such a transducer hm; ci is deterministic if (i) card(i) = 1 and (ii) for every l 2 Q( (X)) and t 2 T there exists at most one r such that l! r 2 R, l(1) = t("), and t 2 c(l! r). Similarly, hm; ci is single-use (called strongly single use restricted in [11, Denition 5.5]) if for every (v) 2 Q(X) and t 2 T there exist at most one l! r 2 R and w 2 pos(r) such that l(1) = t("), t 2 c(l! r), and rj w = (v). The semantics ) hm;ci of hm; ci is dened in the same manner as for an xtt with the additional restriction that lj 1 2 c(l! r). The computed tree transformation hm;ci is dened as for an xtt. We use TOP R and TOP R su to denote the classes of transformations computed by top-down tree transducers with regular look-ahead and single-use top-down tree transducers with regular look-ahead, respectively. We use the prex `d' in the usual manner. For further information on recognizable tree languages and tree transducers, we refer the reader to [16, 17]. 3 Extended Multi Bottom-up Tree Transducers In this section, we recall S-transducteurs ascendants generalises [26], which are a generalization of S-transducteurs ascendants (STA) [26, 27]. We choose to call them extended multi bottom-up tree transducers here in line with [13, 14, 28], where `multi' refers to the fact that states may have ranks dierent from one. Denition 1. An extended multi bottom-up tree transducer (xmbutt) is a tuple (Q; ; ; F; R) where { Q is a uniuely-ranked alphabet of states, disjoint with [ ; { and are ranked alphabets of input and output symbols, respectively; { F Q n Q (0) is a set of nal states; and { R is a nite set of rules of the form l! r where l 2 T (Q(X)) is linear in X and r 2 Q(T (var(l))). A rule l! r 2 R is an epsilon rule if l 2 Q(X); otherwise it is input-consuming. The sets of epsilon and input-consuming rules are denoted by R " and R, respectively. An xmbutt M = (Q; ; ; F; R) is a multi bottom-up tree transducer (mbutt) [respectively, an STA] if l 2 (Q(X)) [respectively, l 2 (Q(X)) [ Q(X)] for every l! r 2 R. Linearity and nondeletion of xmbutts are dened in the natural manner. The xmbutt M is linear if r is linear in var(l) for every rule l! r 2 R. Moreover, M is nondeleting if (i) F Q (1) and (ii) r is nondeleting in var(l) 6

7 for every l! r 2 R. Finally, M is deterministic if (i) there do not exist two distinct rules l 1! r 1 2 R and l 2! r 2 2 R, a substitution : X! X, and w 2 pos(l 2 ) such that l 1 = l 2 j w, and (ii) there does not exist an epsilon rule l! r 2 R such that l(") 2 F. Let us now present a rewrite semantics. For later use (in Section 4) we dene the rewriting for ranked sets larger than and (and possibly containing variables). In the rest of this section, let M = (Q; ; ; F; R) be an xmbutt. Denition 2. Let 0 and 0 be ranked sets disjoint with Q. Moreover, let ; 2 T [ 0(Q(T [ 0)), l! r 2 R, and w 2 pos(). We write ) l!r;w M if there exists a substitution : X! T [ 0 such that j w = l and = [r] w. We write ) l!r M if there exists w 2 pos() such that )l!r;w M, and we write ) M if there exists 2 R such that ) M. The tree transformation computed by M is M = f(t; j 1 ) j t 2 T ; 2 F (T ); t ) M g :! p x 3 x 4 x 1 x 2! x 1 x 2 x 1 x 1 x 2 x 4 x 3 x 2 p u 1 u 2 t u 3 u 4 )M t u 2 u 4 u 3 u 1 Fig. 2. Two example rules of an xmbutt M and a derivation step using the rst rule. Figure 2 displays example rules and a derivation step. The xmbutt N is euivalent to M if N = M. We denote by XMBOT and MBOT the classes of tree transformations computed by xmbutts and mbutts, respectively. We use the prexes `l', `n', and `d' to restrict to linear, nondeleting, and deterministic devices, respectively. For example, ld-xmbot denotes the class of all tree transformations computed by linear and deterministic xmbutts. Our rst result shows that every xmbutt can be turned into an euivalent nondeleting one. Unfortunately, the construction does not preserve determinism. A similar result is proved for mbutts in [28, Theorem 14], in an indirect way; here we give a direct construction. Theorem 3. XMBOT = n-xmbot and l-xmbot = ln-xmbot. Proof. For the xmbutt M = (Q; ; ; F; R) we construct an euivalent nondeleting xmbutt N = (P; ; ; F 0 ; R 0 ). The idea is that N simulates M but 7

8 guesses at each moment which subtrees of the states will be deleted in the remainder of the computation of M; such a guess is veried by N at the moment that these subtrees are deleted by M. The set P of states of N consists of all pairs h; Ji with 2 Q and J [rk()], and the rank of h; Ji is card(j); moreover, F 0 = fh; f1gi j 2 F g. For a tree t = (t 1 ; : : : ; t k ) with 2 Q (k) and t 1 ; : : : ; t k 2 T (X), we dene '(t) = fh; fi 1 ; : : : ; i m gi(t i1 ; : : : ; t im ) j 1 i 1 < < i m kg. The mapping ' is extended to trees t 2 T (Q(T (X))) by '((t 1 ; : : : ; t k )) = f(u 1 ; : : : ; u k ) j u i 2 '(t i ) for all i 2 [k]g for every 2 (k) and t 1 ; : : : ; t k 2 T (Q(T (X))). The set R 0 of rules of N is now dened to be fl 0! r 0 j 9 l! r 2 R: l 0 2 '(l); r 0 2 '(r); var(l 0 ) = var(r 0 )g : By denition, N is nondeleting. Since the rules of N are obtained from those of M by removing subtrees (and renaming states), N is linear whenever M is. Let t 2 T, 2 Q (k), J = fi 1 ; : : : ; i m g [k] with i 1 < < i m, and u i1 ; : : : ; u im 2 T. It can easily be shown that t ) N h; Ji(u i 1 ; : : : ; u im ) if and only if there exist u i 2 T for every i 2 [k] n J such that t ) M (u 1; : : : ; u k ). The proof is by induction on the length of the derivations. For the if-direction, note that for every l! r 2 R and r 0 2 '(r) there is a (uniue) l 0 2 '(l) such that l 0! r 0 2 R 0. Taking 2 F and J = f1g this shows that N = M. ut The following normal form will be at the heart of our composition construction in the next section. It says that exactly one symbol, irrespective whether input or output symbol, occurs in every rule. Denition 4. The rule l! r 2 R is a one-symbol rule if card(pos (l)) + card(pos (r)) = 1 : The xmbutt M is in one-symbol normal form if it has only one-symbol rules. Note that every xmbutt in one-symbol normal form is an STA. Next, we show that every xmbutt can be transformed into an euivalent xmbutt in one-symbol normal form. Theorem 5. For every xmbutt M there exists an euivalent xmbutt N in onesymbol normal form. Moreover, if M is linear (respectively, nondeleting, deterministic), then so is N. Proof. Let us assume, without loss of generality, that all left-hand sides of rules of M are normalized. First we take care of the left-hand sides of rules and decompose rules with more than one input symbol in the left-hand side into several rules, using (a variation of) the construction of [26, Proposition II.B.5]. Thus, we construct an STA euivalent with M. Choose a uniuely-ranked set P 8

9 and a bijection f : T (Q(X))! P such that (i) Q P, (ii) f((x 1 ; : : : ; x n )) = for every 2 Q (n), and (iii) rk(f(l)) = card(var(l)) for every l 2 T (Q(X)). In fact, we will only use f(l) for normalized l 2 T (Q(X)). Let l! r 2 R be input-consuming such that l =2 (P (X)). Suppose that l = (l 1 ; : : : ; l k ) for some 2 (k) and l 1 ; : : : ; l k 2 T (Q(X)). Moreover, let 1 ; : : : ; k : X! X be bijections such that l i i is normalized. Finally, for every i 2 [k] let p i = f(l i i ) and r i = p i (x 1 ; : : : ; x m ) where m = rk(p i ). We construct the xmbutt M 1 = (Q [ fp 1 ; : : : ; p k g; ; ; F; (R n fl! rg) [ R 1;1 [ R 1;2 ) where R 1;1 = fl i i! r i j i 2 [k]; l i =2 Q(X)g and R 1;2 = fl 0! rg, in which l 0 is the uniue normalized tree of fg(p (X)) such that l 0 (i) = p i for every i 2 [k] (note that if l i 2 Q(X), then p i = l i (") and so l 0 j i = lj i ). It is straightforward to prove that M1 = M, and moreover, M 1 is linear (respectively, nondeleting) whenever M is so. Note, however, that M 1 need not be deterministic. Clearly, all rules of R 1;1 have fewer input symbols in the left-hand side than l! r, and the left-hand side of the rule in R 1;2 is in (P (X)). Hence, repeated application of the above construction (keeping P and the mapping f xed) eventually yields an euivalent xmbutt M 0 = (Q 0 ; ; ; F; R 0 ) such that l 2 (P (X)) for each input-consuming rule l! r 2 R 0. Moreover, due to the special choice of P and f, and the fact that all new rules are input-consuming, the xmbutt M 0 is deterministic if M is. Next, we remove all epsilon rules l! r 2 R 0 such that r 2 Q 0 (X). This can be achieved with two standard constructions (that preserve nondeletion and linearity), one for the deterministic and one for the nondeterministic case. Finally, we decompose the right-hand sides. Let M 00 = (S; ; ; F; R 00 ) be the xmbutt obtained so far and l! r 2 R 00 a rule that is not a one-symbol rule. Let r = s(u 1 ; : : : ; u i 1 ; (u 0 1; : : : ; u 0 k ); u i+1; : : : ; u n ) for some s 2 S (n), i 2 [n], 2 (k), and u 1 ; : : : ; u i 1 ; u i+1 ; : : : ; u n ; u 0 1; : : : ; u 0 k 2 T (X). Also, let =2 S be a new state of rank k+n 1. We construct the xmbutt M 00 1 = (S[fg; ; ; F; R1) 00 with R 00 1 = (R 00 n fl! rg) [ R 00 1;1 where R 00 1;1 contains the two rules: { l! (u 1 ; : : : ; u i 1 ; u 0 1; : : : ; u 0 k ; u i+1; : : : ; u n ) and { (x 1 ; : : : ; x k+n 1 )! s(x 1 ; : : : ; x i 1 ; (x i ; : : : ; x i+k 1 ); x i+k ; : : : ; x k+n 1 ). Note that M 00 1 is linear (respectively, nondeleting, deterministic) whenever M 00 is so. The proof of M = M is straightforward and omitted. The second rule is a one-symbol rule and the rst rule has one output symbol less in its right-hand side than l! r. Thus, repeated application of the above procedure eventually yields an euivalent xmbutt N in one-symbol normal form. ut Let us illustrate Theorem 5 on a small example. Example 6. Consider the xmbutt M = (Q; ; ; F; R) such that Q = F = f (1) g, = f (1) ; (0) g, = f (2) ; (0) g, and R = f()! (); ((x 1 ))! ((x 1 ; ))g : 9

10 Clearly, M is linear, nondeleting, and deterministic. Applying the procedure of Theorem 5 stepwise we obtain the states and rules (we only show the newly constructed states and rules, and the rules that are not one-symbol rules): { f (1) ; (0) 1 g and f! 1; ( 1 )! (); ((x 1 ))! ((x 1 ; ))g { f: : : ; (0) 2 g and f: : : ; ( 1)! 2 ; 2! (); ((x 1 ))! ((x 1 ; ))g { f: : : ; (2) 3 g and f: : : ; ((x 1))! 3 (x 1 ; ); 3 (x 1 ; x 2 )! ((x 1 ; x 2 ))g { f: : : ; (1) 4 g and f: : : ; ((x 1))! 4 (x 1 ); 4 (x 1 )! 3 (x 1 ; ); : : : g Note that, for example, f() = 1 and f((x 1 )) = (see the proof of Theorem 5 for the use of f). Overall, we obtain the xmbutt N = (P; ; ; F; R 0 ) with P = f (1) ; (0) ; 1 (0) ; 2 (2) ; 3 (1) g and 4 R0 contains the rules! 1 2! () 4 (x 1 )! 3 (x 1 ; ) ( 1 )! 2 ((x 1 ))! 4 (x 1 ) 3 (x 1 ; x 2 )! ((x 1 ; x 2 )) : ut If the xmbutt M is deterministic, then for every t 2 T there exists at most one 2 Q(T ) such that (i) t ) M and (ii) there exists no such that ) M. Hence, M is a partial function in that case. Moreover, if M is a deterministic mbutt and t 2 T, then there exists at most one 2 Q(T ) such that t ) M. Finally, M is total if for every t 2 T there exists 2 Q(T ) such that t ) M. It is easy to see that every xmbutt can be turned into an euivalent total one. This extends a classical result for bottom-up tree transducers [16, Lemma IV.3.5]. We will need it only for deterministic mbutts. A similar result is also stated in [28, Lemma 8]. Lemma 7. For every deterministic mbutt M there exists an euivalent total deterministic mbutt N. Moreover, if M is linear, then so is N. Proof. We use a construction that is similar to the one for bottom-up tree transducers [16, Lemma IV.3.5]. Specically, let M = (Q; ; ; F; R) and let? =2 Q be a new state of rank 0. We construct the mbutt N = (Q 0 ; ; ; F; R [ R 0 ) where Q 0 = Q [ f? (0) g and R 0 contains ( 1 (x 1 ; : : : ; x rk(1) ); : : : ; k (x m rk( k)+1; : : : ; x m ))!? P k with m = rk( i=1 i) for every 2 (k) and 1 ; : : : ; k 2 Q 0 such that there exists no l! r 2 R with l(") = and l(j) = j for every j 2 [k]. It should be clear that N and M are euivalent since new transitions only yield to the new state?, which is not nal. Moreover, N is total deterministic, and N is linear whenever M is so. ut In the deterministic case we can choose between one-symbol normal form and another normal form: the deterministic mbutt. This allows us to characterize the classes d-xmbot and ld-xmbot in terms of top-down tree transducers, using the result of [13]. 10

11 Theorem 8. For every deterministic xmbutt M there exists an euivalent deterministic mbutt N. Moreover, if M is linear (respectively, nondeleting), then so is N. Hence, d-xmbot = d-top R and ld-xmbot = d-top R su. Proof. We just apply the rst construction in the proof of Theorem 5 (to obtain an euivalent STA) and then remove all epsilon rules in the usual way. Conseuently, we obtain an euivalent deterministic mbutt N. Note that N is linear (respectively, nondeleting) if M is so. To obtain a deterministic multi bottom-up tree transducer of [13] we also have to add a special root symbol and add rules that, while consuming the special root symbol, project on the rst argument of a nal state. It is proved in [13] that such deterministic multi bottom-up tree transducers have the same power as deterministic top-down tree transducers with regular look-ahead. Note that the construction in [13, Lemma 4.2] yields a deterministic mbutt if we drop the special root symbol and associated rules, and turn the states in the left-hand sides of those rules into nal states. The second euality was already suggested in the Conclusion of [14]. We prove it by reconsidering the proofs of [13, Lemmata 4.1 and 4.2]. The proof of [13, Lemma 4.2] shows that if the top-down tree transducer with regular lookahead is single-use, then the mbutt will be linear. To show the other inclusion we need a minor variation of the proof of [13, Lemma 4.1], as follows. Let M = (Q; ; ; F; R) be a deterministic mbutt, and let f : Q! f0; 1g be the characteristic function of F. We dene an euivalent top-down tree transducer hm 0 ; ci with regular look-ahead, with M 0 = (Q 0 ; ; ; I; R 0 ). Intuitively, for every input subtree t, hm 0 ; ci uses its look-ahead to determine the last rule that M applied in its (uniue) derivation t ) M (u 1; : : : ; u m ) with 2 Q. Then, for each n 2 [m], it uses a state n to compute u n. To simulate the nal states of M, state 1 (which computes u 1 ) is split into two states: one to be used as initial state (when is in F ) and one to be used otherwise. In fact, for technical convenience, every state n is split into two states, by storing f(). Thus we dene Q 0 = f0; 1g [mx] where mx = maxfrk() j 2 Qg. Moreover, I = fh1; 1ig and R 0 is dened as follows. Let : l! r be a rule in R, with l(") = 2 (k), l(i) = i for every i 2 [k], and r(") = 2 Q (m). Then, for every n 2 [m], the rule (n): hf(); ni((x 1 ; : : : ; x k ))! rj n is in R 0 where : var(l)! Q 0 (X) is dened for every i 2 [k] and j 2 [rk( i )] by (l(ij)) = hf( i ); ji(x i ). The regular look-ahead c((n)) consists of all trees t 2 T such that (i) t(") = and (ii) for every i 2 [k] there exists i 2 Q(T ) with tj i ) M i and i (") = i. Note that c((n)) is a recognizable tree language, because the state behavior of an mbutt is the same as that of a nite-state tree automaton. Note also that, as observed after Example 6, there is at most one such i for every tj i. Clearly hm 0 ; ci is deterministic: for l 0 = hb; ni((x 1 ; : : : ; x k )) and t 2 T, if (n): l 0! r 0 with = t(") and t 2 c(l 0! r 0 ), then the look-ahead c((n)) determines the states i in the left-hand side of, and hence itself (by the determinism of M) and thus (n). Moreover, if M is linear, then for every j and x i there is at most one occurrence of hf( i ); ji(x i ) in r; hence, hm 0 ; ci is single-use. 11

12 It is straightforward to show by structural induction on t that for every hb; ni 2 Q 0 and (t; u) 2 T T the following statement holds: hb; ni(t) ) hm 0 ;ci u if and only if there exists 2 Q(T ) such that (i) t ) M, (ii) b = f((")), (iii) n 2 [rk(("))], and (iv) u = j n. Taking b = n = 1 this shows that hm 0 ;ci = M. ut Conseuently, even every deterministic xmbutt can be turned into an euivalent total deterministic xmbutt (see Lemma 7). We note that ld-xmbot is strictly included in the class ld-mbot fkv of the linear deterministic multi bottom-up tree transformations discussed in [14]. The inclusion follows from the proof of Theorem 8. It is strict, due to the use of the special root symbol in [14]. For instance, it is easy to see that the transformation f(t; (t; t)) j t 2 T g, with 2 (2), is in ld-mbot fkv (cf. the proof of Theorem 10) but not in ld-xmbot (though it is in l-xmbot). The results shown in [14] for ld-mbot fkv also hold for ld-xmbot. The previous theorem already showed that xmbutts and mbutts are closely related. Let us investigate this relation in slightly more detail. A tree transformation T T is nitary if fu j (t; u) 2 g is nite for every t 2 T. Note that M is nitary for every mbutt M. Theorem 9. For every xmbutt M such that M is nitary, there exists an euivalent mbutt N. Moreover, if M is linear, then so is N. Proof. By Theorems 3 and 5, we can assume that M = (Q; ; ; F; R) is nondeleting and in one-symbol normal form. Let G = (Q; E) be the directed graph with E = f(l("); r(")) j l! r 2 R " g. Note that, by one-symbol normal form, any epsilon rule l! r 2 R " contains at least one output symbol (in r). Hence, since M is nondeleting and M nitary, we can assume (without loss of generality) that G is acyclic. In fact, if G is cyclic, then all states participating in cycles cannot be reachable and thus can safely be removed. Since G is acyclic, we can remove all epsilon rules in the standard way to obtain N. ut Finally, we verify that l-xmbot is suitably powerful for applications in machine translation. We do this by showing that all transformations of l-xtop are also in l-xmbot. In particular, this shows that xmbutts can handle rotations [23]. Theorem 10. l-xtop l-xmbot. Proof. The inclusion can be proved in a similar manner as l-top l-bot [8, Theorem 2.8], where l-bot denotes the class of transformations computed by linear bottom-up tree transducers [39, 8]. Let us now recall the idea. Let (l)! r be a rule of the linear xtt. We construct a rule l! (r 0 ) of the xmbutt where is the substitution that maps every variable v to a new nullary state?, if v does not occur in r, or to p(v) with p being the state such that p(v) occurs in r. Finally, r 0 is obtained from r by replacing all occurrences of p(v) simply by v. In addition, we add the rule (?; : : : ;?)!? for every 2. 12

13 Every transformation of l-xtop preserves recognizability [28, Theorem 4]. On the other hand, it is simple to construct a linear (nondeleting, and deterministic) mbutt that computes the transformation f((t); (t; t)) j t 2 T nfg g where 2 (1) and 2 (2). Conseuently, not every transformation of lnd-xmbot preserves recognizability, which proves strictness. ut 4 Composition Construction In this section, we investigate compositions of tree transformations computed by xmbutts. Let us rst recall the classical composition results for bottom-up tree transducers [8, 6]. Let M and N be bottom-up tree transducers. If M is linear or N deterministic, then the composition of the transformations computed by M and N can be computed by a bottom-up tree transducer. As a special case, the classes of transformations computed by linear, linear and nondeleting, and deterministic bottom-up tree transducers are closed under composition. In our setting, let M and N be xmbutts. We will prove that if M is linear or N is deterministic, then there is an xmbutt M ;N that computes M ; N. In particular, we prove that l-xmbot, d-xmbot, and ld-xmbot are closed under composition. The closure of l-xmbot was rst presented in [26, Propositions II.B.5 and II.B.7]. The closure of d-xmbot is already known: Theorem 8 shows that d-xmbot coincides with d-top R, which is closed under composition [9, Theorem 2.11]. In [27, Proposition 2.5] closure under composition of STA transformations was shown for a dierent type of determinism (in which nal states are allowed to have epsilon rules, but must have rank 1). Finally, the closure of ld-xmbot is to be expected from its characterization in Theorem 8 and the fact that the single-use restriction was introduced in [15, 18] to guarantee the closure under composition of attribute grammar transformations (see [25, Theorem 3]). Let us now prepare the composition construction. In the following, let M = (Q; ; ; F M ; R M ) and N = (P; ; ; F N ; R N ) be xmbutts such that Q, P, and [ [ are pairwise disjoint (which can always be assumed without loss of generality). We dene the uniuely-ranked alphabet QhP i = fhp 1 ; P : : : ; p n i j 2 Q (n) ; p 1 ; : : : ; p n 2 P g n such that rk(hp 1 ; : : : ; p n i) = rk(p i=1 i) for every 2 Q (n) and p 1 ; : : : ; p n 2 P. Let = [ [ [ X. We dene the mapping ': T [QhP i! T [Q[P such that for every hp 1 ; : : : ; p n i 2 QhP i (k), 2 (k), and t 1 ; : : : ; t k 2 T [QhP i '(hp 1 ; : : : ; p n i(t 1 ; : : : ; t k )) = (p 1 ('(t 1 ); : : : ; '(t l )); : : : ; p n ('(t m ); : : : ; '(t k ))) '((t 1 ; : : : ; t k )) = ('(t 1 ); : : : ; '(t k )) where l = rk(p 1 ) and m = k rk(p n ) + 1. Thus, we group the subtrees below the corresponding state p i. Note that ' is a linear and nondeleting tree homomorphism, which acts as a bijection from T (QhP i(t (X))) to T (Q(P (T (X)))). In the seuel, we identify t with '(t) for all t 2 T (QhP i(t (X))). We display ' in Fig. 3. Now let us present the composition construction. 13

14 hp 1 ; p 2 i t 1 t 2 t 3 7! p 1 t 1 p 2 t 2 t 3 Fig. 3. Tree homomorphism ' where 2 Q (2), p 1 2 P (1), and p 2 2 P (2). Denition 11. Let M = (Q; ; ; F M ; R M ) be an xmbutt in one-symbol normal form and N = (P; ; ; F N ; R N ) an STA. The composition of M and N is the STA M ; N = (QhP i; ; ; F M hf N i; R 1 [ R 2 [ R 3 ) with: R 1 = fl! r j l 2 LHS( ) and 9 2 R M : l ) M rg R 2 = fl! r j l 2 LHS(") and 9 2 R " N : l ) N rg R 3 = fl! r j l 2 LHS(") and R " M ; 2 2 R N : l () 1 M ; )2 N ) rg where LHS( ) and LHS(") are the sets of normalized trees of (QhP i(x)) and QhP i(x), respectively. To illustrate the implicit use of ', let us show the \ocial" denition of R 1 : R 1 = fl! r j l 2 LHS( ); r 2 QhP i(t (X)); and 9 2 R M : '(l) ) M '(r)g : Note that the construction preserves linearity. Moreover, it preserves determinism if N is an mbutt (in which case R 2 is empty, of course). The remaining part of this section will investigate when M;N = M ; N, but rst let us illustrate the construction on our small running example. Example 12. Let M be the xmbutt N of Example 6 in one-symbol normal form, and let N = (fg (1) ; h (1) g; ; ; fgg; R N ) be the linear and nondeleting STA with = [ f (1) g and R N = f! h(); h(x 1 )! h((x 1 )); (h(x 1 ); h(x 2 ))! g((x 1 ; x 2 ))g : Clearly, N computes f((; ); ( i (); j ()) j i; j 2 Ng, and hence M ; N is f((()); ( i (); j ()) j i; j 2 Ng. Now let us construct M ; N. As states we obtain fhgi (1) ; hhi (1) ; 1 hi (0) ; 2 hi (0) ; 3 hg; gi (2) ; 3 hg; hi (2) ; 3 hh; gi (2) ; 3 hh; hi (2) ; 4 hgi (1) ; 4 hhi (1) g ; of which only hgi is nal. We present some relevant rules only [left in ocial form l! r; right in alternative notation '(l)! '(r)]. The 1st, 2nd, and 5th 14

15 rules are in R 1 (of Denition 11), the 4th rule is in R 2, and the remaining rules are in R 3.! 1 hi! 1 ( 1 hi)! 2 hi ( 1 )! 2 2 hi! hhi() 2! (h()) hhi(x 1 )! hhi((x 1 )) (h(x 1 ))! (h((x 1 ))) (hhi(x 1 ))! 4 hhi(x 1 ) ((h(x 1 )))! 4 (h(x 1 )) 4 hhi(x 1 )! 3 hh; hi(x 1 ; ) 4 (h(x 1 ))! 3 (h(x 1 ); h()) 3 hh; hi(x 1 ; x 2 )! hgi((x 1 ; x 2 )) 3 (h(x 1 ); h(x 2 ))! (g((x 1 ; x 2 ))) ut Next, we will prove that M ; N is in XMBOT provided that (i) M is linear or (ii) N is deterministic. Without loss of generality, we can assume that M is in one-symbol normal form, by Theorem 5, and that it is nondeleting in case (i), by Theorem 3. We can also assume that N is an STA in case (i), by Theorem 5, and a total deterministic mbutt in case (ii), by Theorem 8 and Lemma 7. Thus, we can meet the reuirements of Denition 11, and we will prove that with the above assumptions the composition construction is correct, i.e., that M ; N computes M ; N (cf. [6, Theorem 6]). Henceforth we assume the notation of Denition 11. We start with a simple lemma. It shows that in a derivation that uses steps of M and N (like the derivations of M ;N) we can always perform all steps of M rst and only then perform the derivation steps of N. This already proves one direction needed for the correctness of the composition construction. Lemma 13. Let t 2 T and 2 Q(P (T )). If t ) where ) is ) M [ ) N, then t () M ; ) N ). In particular, M;N M ; N. Proof. Let SF = T (Q(T (P (T )))). It obviously suces to prove the following statement: For ; 0 2 SF, if () N ; ) M ) 0, then () M ; ) N ) 0. To this end, let 2 2 R N, w 2 2 pos(), 2 SF, 1 2 R M, and w 1 2 pos() be such that ) 2;w2 N ) 1;w1 M 0. Clearly, w 2 is not a prex of w 1, and thus, 1 is applicable to at w 1. Hence there exists 0 2 SF such that ) 1;w1 M 0. If w 1 is not a prex of w 2, then 0 ) 2;w2 N 0 because rewriting occurs in incomparable positions. Now suppose that w 1 is a prex of w 2. Let 1 be l! r, and let v 2 pos X (l) and y 2 N be such that w 1 vy = w 2. Then 0 () 2;w1v1y N ; ; ) 2;w1vmy N ) 0 where pos l(v) (r) = fv 1 ; : : : ; v m g. ut Next, we prove that M ; N M;N under the above assumptions on M and N. We achieve this by a standard induction over the length of the derivation. Lemma 14. Let t 2 T and 2 Q(P (T )) be such that t () M ; ) N ). If (i) M is linear and nondeleting, or (ii) N is a total deterministic mbutt, then t ) M;N. With 2 F M(F N (T )) we obtain M ; N M;N. 15

16 Proof. Let t = (t 1 ; : : : ; t k ) for some 2 (k) and t 1 ; : : : ; t k 2 T. The proof is by induction on the length of the derivation in M. Let l! r 2 R M and : X! T be such that t ) M l ) M r ) N. Since M is in one-symbol normal form, we have card(pos (r)) 1. If l! r is input-consuming, let =. If l! r is an epsilon rule with r(i) 2, then there exists a uniue and a reordering r ) N ) N of the given derivation r ) N such that every rule in the derivation ) N applies at position i; note that in the rst step of the latter derivation an input-consuming rule is applied. It should now be clear that, since either M is linear or N is a deterministic mbutt, there exists a substitution 0 : X! P (T ) such that = r 0. In the deterministic case this uses the property of deterministic mbutts mentioned between Example 6 and Lemma 7. Since M is nondeleting or N is total, we may assume that (x) ) N 0 (x) for every x 2 var(l). Thus, t ) M l ) N l0. By the induction hypothesis (applied to t 1 ; : : : ; t k if l! r is input-consuming, and to t if it is an epsilon rule), we obtain t ) M;N l0 ) M r 0 ) N. If l! r is an input-consuming rule, then r0 =, and thus by the denition of R 1 we conclude t ) M;N l0 ) M;N r 0 =. On the other hand, if l! r is an epsilon rule, then the rst rule applied in r 0 ) N is an input-consuming rule of N and all remaining rules are epsilon rules of N. By the denition of R 3 and R 2 we conclude t ) M;N l0 ) M;N. ut Now we are ready to prove the main theorem of this section. Theorem 15. l-xmbot ; XMBOT XMBOT and XMBOT ; d-xmbot XMBOT : Moreover, l-xmbot, d-xmbot, and ld-xmbot are closed under composition. Proof. The inclusions follow directly from Lemmata 13 and 14 using Lemma 7 and Theorems 3, 5, and 8 to establish the preconditions of Denition 11 and Lemma 14. The closure results follow from the fact that the composition construction preserves linearity and determinism (provided that N is a total deterministic mbutt in the latter case). ut Note that with the help of Theorem 9, our approach can be used to reprove [28, Theorem 11] (which is obtained from Theorem 15 by removing every X). 5 Relation to Extended Top-down Tree Transducers Now, let us focus on the limitation of the power of xmbutts. By [28, Theorem 14] every mbutt computes a transformation of ln-top ; d-top and can thus be simulated by two top-down tree transducers. Moreover, if the mbutt is linear, then the second top-down tree transducer is single-use. Here we prove a similar result for xmbutts. 16

17 Theorem 16. l-xmbot = ln-xtop ; d-top su and XMBOT = ln-xtop ; d-top : Proof. By Theorems 8, 10, and 15, we obtain ln-xtop ; d-top su l-xmbot ; ld-xmbot l-xmbot ln-xtop ; d-top l-xmbot ; d-xmbot XMBOT : For the decomposition results, we employ the standard idea of separating the input and output behavior of the given xmbutt M (cf. [8, Theorem 3.15]). For each input tree, the linear and nondeleting xtt M 1 will output \rule trees" that encode which rules could be applied given the input tree. The second xtt M 2 will then deterministically execute these rules and create the output trees. Note that M 2 will be similar to the transducer M 0 in the proof of Theorem 8. In particular, it will have one state for each argument position of a state. More formally, if t ) M (u 1; : : : ; u m ), then there is a \rule tree" s such that (t) ) M 1 s and n(s) ) M 2 u n for every n 2 [m]. Let M = (Q; ; ; F; R) be a nondeleting xmbutt in one-symbol normal form. This can be assumed without loss of generality (see Theorems 3 and 5). We turn R into a uniuely-ranked alphabet such that rk(l! r) = card(pos Q (l)) for every rule l! r 2 R. To avoid many parentheses, we write l w instead of l(w) for all l 2 T (Q(X)) and w 2 pos(l). Now, we dene the xtt M 1 = (Q; ; R; F; R 1 ) with R 1 = fr(")(l " (x 1 ; : : : ; x k ))! (l! r)(l 1 (x 1 ); : : : ; l k (x k )) j l! r 2 R g [ fr(")(x 1 )! (l! r)(l " (x 1 )) j l! r 2 R " g : Clearly, M 1 is linear and nondeleting. The evaluation of the intermediate trees produced by M 1 is performed by the deterministic top-down tree transducer M 2 = ([mx]; R; ; f1g; R 2 ) where mx = maxfrk() j 2 Qg and for every rule l! r 2 R (k) and n 2 [rk(r("))] we have n((l! r)(x 1 ; : : : ; x k ))! rj n 2 R 2 with : var(r)! [mx](x) given by ( j(x i ) if ij 2 pos (x) = x (l) with i; j 2 N j(x 1 ) if j 2 pos x (l) with j 2 N: Note that if M is linear, then for every j(x) 2 [mx](x) there exists at most one occurrence of j(x) in r. Conseuently, M 2 is single-use. The proof of M1 ; M2 = M is straightforward and left to the reader. ut Let us look at the eualities of Theorems 8 and 16 in some more detail. We note that the direction of all four eualities is proved in essentially the same manner. If we compare the deterministic case (Theorem 8) to the nondeterministic case (Theorem 16), then we note that in the former case there exists at most one successful \rule tree", which can be constructed by a deterministic, nite-state bottom-up relabeling [8, 9]. Thus, the deterministic top-down tree 17

18 transducer can uery its look-ahead facility for the rule that labels its current position in the single successful \rule tree". Now let us consider the implications of Theorem 16. First, it shows that we can separate nondeterminism and state checking on the one hand (performed by the linear and nondeleting xtt) from evaluation on the other hand (performed by the deterministic top-down tree transducer). Since linear xtts preserve recognizability [28, Theorem 4], the intermediate set of trees from T R, the rule trees mentioned in the proof of Theorem 16, form a recognizable tree language for every input tree of T. Formally, for every t 2 T the set fu 2 T R j (t; u) 2 M1 g is recognizable. Even the set fu 2 T R j 9t 2 T : (t; u) 2 M1 g is recognizable, which shows that the set of rule trees of an xmbutt is always recognizable. This is a strong indication toward the existence of ecient training algorithms. Second, Theorem 16 limits the power of xmbutts. They have the same power as a composition of two (special) xtts. In particular, the rst euation of Theorem 16 shows in a precise way how much stronger l-xmbot is with respect to ln-xtop. Since ln-xtop transformations preserve recognizability, this characterization and [28, Theorem 14] imply that l-xmbot(rec) = l-mbot(rec) = d-top su (REC) : The single-use top-down tree transducer has a bounded copying facility: it makes at most card(q) copies of each input subtree; thus it is a so-called \nitecopying" top-down tree transducer, see [12, Denition (3.1.9)]. Hence, denoting the class of deterministic nite-copying top-down tree transformations by d-top fc, we have d-top su (REC) d-top fc (REC). Actually, by [11, Corollary 7.5], these classes are eual, and so l-xmbot(rec) = l-mbot(rec) = d-top fc (REC) : By [12, Corollary (3.2.8)], d-top fc (REC) is a proper subclass of d-top(rec) (which euals XMBOT(REC) by Theorem 16); intuitively, this is because topdown tree transducers can do unbounded copying. Conseuently, l-xmbot and d-top are incomparable subclasses of XMBOT. The class of string languages yield(d-top fc (REC)), consisting of the yields of tree languages in d-top fc (REC), has been investigated in [33, 40, 30], motivated by the grammatical generation of discontinuous constituents in natural languages. Several formal models that generate this class are discussed in [30, Section 5] (cf. also the discussion in [10, Section 6]). One of these models is the multiple context-free grammar of [33]. In fact, the so-called parallel multiple context-free grammar ([33, Section 2.2]) can be viewed as a variation of the mbutt, generating the yields of its output trees; similarly, the multiple context-free grammar is a variation of the linear mbutt. Thus, it is immediate that the class PMCFL of parallel multiple context-free languages euals yield(xmbot(rec)) and that the class MCFL of multiple context-free languages euals yield(l-xmbot(rec)). We nally mention that the relational tree grammars of [31] (which are closely related to the multiple context-free grammar) generate the class d-top fc (REC), as proved in [31, Proposition 4.8]. 18

19 Finally, let us summarize the relations between classes of tree transformations computed by several tree transducer models in an inclusion diagram (see Fig. 4). Note that all lines in the diagram are oriented upwards, and we use solid lines to denote strict inclusion and dashed lines to denote inclusion. Moreover, for any class T of transformations, T 2 and T denote T ; T and the composition closure of T, respectively. Theorem 17. Figure 4 is an inclusion diagram. Proof. The nontrivial inclusions are by Theorems 3, 8, 10, 15, 16, and [28, Theorems 14 and 19]. To prove strictness and incomparability, we need to prove the following statements: ld-xmbot 6 lnd-xmbot (1) lnd-xmbot 6 l-xtop (2) d-xmbot 6 l-xmbot (3) TOP 2 6 XMBOT (4) lne-xtop 6 d-xmbot (5) ln-xtop 6 TOP 2 (6) lne-xtop 2 6 l-xtop (7) le-xtop 6 ln-xtop 2 : (8) We leave the proof of (1) and (8) as an exercise. Statement (2) is clear by the argument in the proof of Theorem 10. By [12, Corollary ] we have XMBOT(REC) = d-top(rec) TOP(REC) ; which yields (4). Statements (5) and (6) are trivial (because the tree transformations in d-xmbot are functions and those in TOP 2 are nitary) and (7) is proved in [5, Section 3.4] and [29, Lemma 5.2]. Finally, it remains to prove (3). We already remarked that d-top 6 l-xmbot whereas d-top d-xmbot by Theorem 8. Let us present a more detailed proof, which also proves that d-bot 6 l-xmbot where d-bot is the class of transformations computed by deterministic bottom-up tree transducers [39, 8] (i.e., deterministic mbutts in which all states have rank 1). Let M = (f?g; ; ; f?g; R 0 ) with = fa (1) ; e (0) g, = fa (2) ; e (0) g, and R 0 contains the two rules e!?(e) and a(?(x 1 ))!?(a(x 1 ; x 1 )) : The transformation M 2 d-bot transforms a unary tree s 2 T into a full binary tree such that along all paths from the root to a leaf we read s; note that M is a tree homomorphism, and hence also M 2 d-top. Now suppose that M 2 l-xmbot. Since M is nitary and M 2 l-xmbot, we can conclude that M 2 l-mbot by Theorem 9. Next, we show that for every linear mbutt N there exists an integer c such that size(u) c size(t) for every (t; u) 2 N where for 19

Extended Multi Bottom-Up Tree Transducers

Extended Multi Bottom-Up Tree Transducers Joost Engelfriet 1, Eric Lilin 2, and Andreas Maletti 3;? 1 Leiden Instite of Advanced Comper Science Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands