Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

Size: px

Start display at page:

Download "Applications of Tree Automata Theory Lecture VI: Back to Machine Translation"

Carmella Merritt
6 years ago
Views:

1 Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing Universität Stuttgart, Germany Yekaterinburg August 25, 2014 Lecture VI: Tree Transducers in SMT A. Maletti 1

2 Roadmap 1 Theory of Tree Automata 2 Parsing Basics and Evaluation 3 Parsing Advanced Topics 4 Machine Translation Basics and Evaluation 5 Theory of Tree Transducers 6 Machine Translation Advanced Topics Always ask questions right away! Lecture VI: Tree Transducers in SMT A. Maletti 2

3 Tree Transducers in Machine Translation Important relations SCFG = synchronous context-free grammar LTG-LTG [CHIANG, 2007] (synchronous local tree grammar) ln-top special top-down tree transducer STSG = synchronous tree substitution grammar TSG-TSG [EISNER, 2003] ln-xtop special extended top-down tree transducer STAG = synchronous tree adjunction grammar [SHIEBER, SCHABES, 1990] TAG-TAG SCFTG = synchronous context-free tree grammar [NEDERHOF, VOGLER, 2012] CFTG-CFTG Lecture VI: Tree Transducers in SMT A. Maletti 3

4 Tree Transducers in Machine Translation Towards asymetric relations STSSG = synch. tree-sequence substitution grammar [ZHANG et al., 2008] TSSG-TSSG lmbot = local shallow multi bottom-up tree transducer [BRAUNE et al., 2013] LTG-TSSG ln-xmbot corresponds rougly to RTG-TSSG Lecture VI: Tree Transducers in SMT A. Maletti 4

5 Tree Transducers in Machine Translation Extended Multi Bottom-up Tree Transducers Lecture VI: Tree Transducers in SMT A. Maletti 5

6 Extended Multi Bottom-up Tree Transducers S ynzran w VP kana VP NP-SBJ PP-CLR PP-MNR Aly NP b NP h $kl mdhk him at NP looking PP-CLR a funny way in NP PP they were VP And NP-SBJ VP S Lecture VI: Tree Transducers in SMT A. Maletti 6

7 Extended Multi Bottom-up Tree Transducers ynzran S w VP kana VP NP-SBJ PP-CLR PP-MNR NP h NP q NP him b q q b in $kl $kl a. way mdhk q mdhk funny w qw And Aly q Aly at kana q kana NP-SBJ they. were ynzran q ynzran looking Aly NP b NP h $kl mdhk him at NP looking PP-CLR a funny way in NP PP they were VP And NP-SBJ VP S Lecture VI: Tree Transducers in SMT A. Maletti 6

8 Extended Multi Bottom-up Tree Transducers q ynzran q w S q kana VP NP-SBJ VP PP-CLR PP-MNR q Aly q NP q b NP NP h NP q NP him b q q b in $kl $kl a. way mdhk q mdhk funny w qw And Aly q Aly at kana q kana NP-SBJ they. were ynzran q ynzran looking q $kl q mdhk q $kl q mdhk q $kl q Aly q NP q b NP q ynzran PP-CLR PP q kana VP q w q kana VP S Lecture VI: Tree Transducers in SMT A. Maletti 6

9 Extended Multi Bottom-up Tree Transducers q ynzran q w S q kana VP NP-SBJ VP PP-CLR PP-MNR q Aly q NP q b NP q $kl q $kl q mdhk q $kl q mdhk NP h NP q NP him b q q b in $kl $kl a. way mdhk q mdhk funny w qw And Aly q Aly at kana q kana NP-SBJ they PP-CLR qaly qnp. were ynzran q ynzran looking PP-CLR q PP-CLR qaly qnp NP NP q $kl qmdhk NP q $kl qmdhk q $kl q q Aly q NP q b NP q ynzran PP-CLR PP q kana VP q w q kana VP S Lecture VI: Tree Transducers in SMT A. Maletti 6

10 Extended Multi Bottom-up Tree Transducers q ynzran S q w VP q kana VP NP-SBJ q PP-CLR PP-MNR NP h NP q NP him b q q b in $kl $kl a. way mdhk q mdhk funny w qw And Aly q Aly at kana q kana NP-SBJ they. were ynzran q ynzran looking q b q NP PP-CLR qaly qnp PP-CLR q PP-CLR qaly qnp q b q ynzran q PP-CLR PP q NP NP NP q $kl qmdhk NP q $kl qmdhk q $kl q q kana VP q w q kana VP S Lecture VI: Tree Transducers in SMT A. Maletti 6

11 Extended Multi Bottom-up Tree Transducers q ynzran S q w VP q kana VP NP-SBJ q PP-CLR PP-MNR NP h NP q NP him b q q b in $kl $kl a. way mdhk q mdhk funny w qw And Aly q Aly at kana q kana NP-SBJ they. were ynzran q ynzran looking q b q NP PP-CLR qaly qnp PP-CLR q PP-CLR qaly qnp q b q NP q ynzran q PP-CLR PP q kana VP q w q kana VP S NP NP q $kl qmdhk NP q $kl qmdhk q $kl q PP-MNR qb qnp PP q PP-MNR qb qnp Lecture VI: Tree Transducers in SMT A. Maletti 6

12 Extended Multi Bottom-up Tree Transducers Extracted rules S S VP q q VP q kana. q w q VP q w q VP q VP q kana q VP PP-MNR PP q PP-MNR q b q NP q b q NP q ynzran NP-SBJ VP NP h w qw And NP q NP him kana q kana VP NP NP q NP q kana q VP q $kl q mdhk q $kl q mdhk q $kl b q q b in $kl $kl q a. way mdhk mdhk funny NP-SBJ they. were ynzran q ynzran looking Aly q Aly at VP PP-CLR PP-CLR q PP-CLR q q VP q PP-MNR PP-CLR q ynzran q PP-CLR q PP-MNR q Aly q NP q Aly q NP Lecture VI: Tree Transducers in SMT A. Maletti 7

13 Extended Multi Bottom-up Tree Transducers Synchronous grammar notation: VP q VP q kana. q kana q VP q kana VP q VP Tree transducer notation (links expressed by variables): VP q VP q kana q VP x 1 VP x 1 x 2 x 3 x 2 x 3 Lecture VI: Tree Transducers in SMT A. Maletti 8

14 Extended Multi Bottom-up Tree Transducers Definition (Alternative for linear XMBOT) Linear extended multi bottom-up tree transducer (Q, Σ,, I, R) finite set Q states alphabets Σ and input/output symbols I Q initial states finite set R T Σ (Q) Q T (Q) rules each q Q occurs at most once in l (l, q, r) R each q Q that occurs in r also occurs in l (l, q, r) R S S VP q q VP q kana. q w q VP q w q VP q VP q kana q VP VP NP NP q NP q kana q VP q $kl q mdhk q $kl q mdhk q $kl Lecture VI: Tree Transducers in SMT A. Maletti 9

15 Extended Multi Bottom-up Tree Transducers Rules S S VP q q VP q kana. q w q VP q w q VP q VP q kana q VP q kana VP q VP Lecture VI: Tree Transducers in SMT A. Maletti 10

16 Extended Multi Bottom-up Tree Transducers Rules S S VP q q VP q kana. q w q VP q w q VP q VP q kana q VP q kana VP q VP S S q w VP q q w q kana VP q kana q VP q kana q VP Lecture VI: Tree Transducers in SMT A. Maletti 10

17 Tree Transducers in Machine Translation Evaluation of XMBOT Lecture VI: Tree Transducers in SMT A. Maletti 11

18 Statistical Machine Translation Implementation [BRAUNE et al., 2013] shallow lmbot implemented in MOSES framework [KOEHN et al., 2007] variant of the syntax-based component [HOANG et al., 2009] Lecture VI: Tree Transducers in SMT A. Maletti 12

19 Statistical Machine Translation Implementation [BRAUNE et al., 2013] shallow lmbot implemented in MOSES framework [KOEHN et al., 2007] variant of the syntax-based component hard- and soft-matching available incl. optimizations like cube pruning [HOANG et al., 2009] Lecture VI: Tree Transducers in SMT A. Maletti 12

20 Statistical Machine Translation Evaluation English-to-German WMT 2009 translation task 4th version EUROPARL and news commentary (approx. 1.5 million sentence pairs) System BLEU-4 constrained? University of Edinburgh (winner) 15.2 GOOGLE 14.7 University of Stuttgart 12.5 Moses (SCFG) tree-to-tree 12.6 lmbot tree-to-tree 13.1 Lecture VI: Tree Transducers in SMT A. Maletti 13

21 Statistical Machine Translation Example lmbot translation das problem der globalen erwärmung hat präsident václav klaus wieder kommentiert president václav klaus has again commented on the problem of global warming präsident václav klaus hat wieder kommentiert das problem der globalen erwärmung baseline translation Lecture VI: Tree Transducers in SMT A. Maletti 14

22 Statistical Machine Translation Evaluation [POPOVA, 2014] Russian-to-English WMT 13 translation task YANDEX corpus (approx. 1 million sentence pairs) System BLEU-4 winner [PINO et al., 2013] 25.9 (hierarchical phrase-based) hierarchical phrase-based 21.9 lmbot string-to-tree 20.7 string-to-tree 19.8 Lecture VI: Tree Transducers in SMT A. Maletti 15

23 Statistical Machine Translation Pipeline Input Parser Translation model Language model Output Lecture VI: Tree Transducers in SMT A. Maletti 16

24 Statistical Machine Translation Definition (One-symbol normal form) XMBOT (Q, Σ,, I, R) in one-symbol normal form if l contains at most one (occurrence of a) symbol of Σ for all (l, q, r) R Lecture VI: Tree Transducers in SMT A. Maletti 17

25 Statistical Machine Translation Definition (One-symbol normal form) XMBOT (Q, Σ,, I, R) in one-symbol normal form if l contains at most one (occurrence of a) symbol of Σ for all (l, q, r) R Theorem [ENGELFRIET et al., 2009] For every XMBOT there exists an equivalent XMBOT in one-symbol normal form NP h NP q NP him NP q 1 q NP q 1 h q 1 NP him Lecture VI: Tree Transducers in SMT A. Maletti 17

26 Statistical Machine Translation q ynzran NP-SBJ VP q PP-CLR q q VP PP-MNR q ynzran q PP-CLR q PP-MNR VP VP VP q ynzran q 2 q PP-CLR q PP-MNR q VP q ynzran q PP-CLR q PP-MNR NP-SBJ q 3 q 2 ε q 3 ε Transformation into one-symbol normal form Linear-time procedure Lecture VI: Tree Transducers in SMT A. Maletti 18

27 Statistical Machine Translation Implementation is often a weighted tree automaton yields an (unambiguous) weighted tree automaton for the parses of the input sentence efficient representation of L: T Σ R 0 Lecture VI: Tree Transducers in SMT A. Maletti 19

28 Statistical Machine Translation Implementation is often a weighted tree automaton yields an (unambiguous) weighted tree automaton for the parses of the input sentence efficient representation of L: T Σ R 0 Approach we can intersect all (weighted) parses with our translation model (XMBOT) improved stability under parse errors (but only at decode) Lecture VI: Tree Transducers in SMT A. Maletti 19

29 Statistical Machine Translation Definition (Input product) 1 weighted translation τ : T Σ T R 0 2 weighted language p : Σ R 0 (language model) pτ : T Σ T R 0 (t, u) τ(t, u) p(yd(t)) γ σ s 1 s 2 s 1 s 3 s 1 s 2 s 1 s 2 s 2 s 3 s 2 δ(s 1, α) s 1 α s 2 Lecture VI: Tree Transducers in SMT A. Maletti 20

30 Statistical Machine Translation Definition (Input product) 1 weighted translation τ : T Σ T R 0 2 weighted tree language L: T Σ R 0 (parses) Lτ : T Σ T R 0 (t, u) τ(t, u) L(t) Lecture VI: Tree Transducers in SMT A. Maletti 21

31 Statistical Machine Translation Definition (Input product) 1 weighted translation τ : T Σ T R 0 2 weighted tree language L: T Σ R 0 (parses) Lτ : T Σ T R 0 (t, u) τ(t, u) L(t) Theorem [, 2011]... product of wxmbot M with... is side wa A wta A input O( M A 3 ) O( M A ) output O( M A 2 rk(m)+2 ) O( M A rk(m) ) Lecture VI: Tree Transducers in SMT A. Maletti 21

32 Statistical Machine Translation Definition (Input product) 1 weighted translation τ : T Σ T R 0 2 weighted tree language L: T Σ R 0 (parses) Lτ : T Σ T R 0 (t, u) τ(t, u) L(t) Theorem [, 2011]... product of wxmbot M with... is side wa A wta A input O( M A 3 ) O( M A ) output O( M A 2 rk(m)+2 ) O( M A rk(m) ) Lecture VI: Tree Transducers in SMT A. Maletti 21

33 Statistical Machine Translation Example (Input product) S q w q VP q S q w q VP q VP S VP-5 VP-3 S-1 ε S VP-5, q w VP-3, q VP S-1,q S VP-5, q w VP-3, q VP VP-3, q VP Lecture VI: Tree Transducers in SMT A. Maletti 22

34 Statistical Machine Translation Pipeline Input Parser Translation model Language model Output exact decoding of the red part integrating the language model would require O( M A 2 rk(m)+2 ) Lecture VI: Tree Transducers in SMT A. Maletti 23

35 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models Lecture VI: Tree Transducers in SMT A. Maletti 24

36 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations Lecture VI: Tree Transducers in SMT A. Maletti 24

37 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations 3 reranks them by the language model (i.e., multiplies their score with the LM score and resorts) Lecture VI: Tree Transducers in SMT A. Maletti 24

38 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations 3 reranks them by the language model (i.e., multiplies their score with the LM score and resorts) Disadvantages Lecture VI: Tree Transducers in SMT A. Maletti 24

39 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations 3 reranks them by the language model (i.e., multiplies their score with the LM score and resorts) Disadvantages 1 s... Lecture VI: Tree Transducers in SMT A. Maletti 24

40 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations 3 reranks them by the language model (i.e., multiplies their score with the LM score and resorts) Disadvantages 1 s... l... Lecture VI: Tree Transducers in SMT A. Maletti 24

41 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations 3 reranks them by the language model (i.e., multiplies their score with the LM score and resorts) Disadvantages 1 s... l... o... Lecture VI: Tree Transducers in SMT A. Maletti 24

42 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations 3 reranks them by the language model (i.e., multiplies their score with the LM score and resorts) Disadvantages 1 s... l... o... w Lecture VI: Tree Transducers in SMT A. Maletti 24

43 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations 3 reranks them by the language model (i.e., multiplies their score with the LM score and resorts) Disadvantages 1 s... l... o... w 2 language model not integrated (needs strict structure) Lecture VI: Tree Transducers in SMT A. Maletti 24

44 Statistical Machine Translation Implementation [QUERNHEIM, 2014] 1 exactly computes a wta representing the derivations for the first two models 2 extracts the k-best derivations 3 reranks them by the language model (i.e., multiplies their score with the LM score and resorts) Disadvantages 1 s... l... o... w 2 language model not integrated (needs strict structure) 3 strictness coverage problems Lecture VI: Tree Transducers in SMT A. Maletti 24

45 Statistical Machine Translation Evaluation English-to-German 7th EUROPARL, news commentary, and Common crawl (approx million sentence pairs) System BLEU-4 WMT 13 WMT 14 winner Moses (SCFG) tree-to-tree 13.1 ExactMBOT tree-to-tree Moses (SCFG) string-to-tree 14.7 lmbot string-to-tree 15.5 Moses phrase-based 17.5 Lecture VI: Tree Transducers in SMT A. Maletti 25

46 Tree Transducers in Machine Translation Composition Lecture VI: Tree Transducers in SMT A. Maletti 26

47 Composition τ 1 ; τ 2 = {(s, u) t : (s, t) τ 1, (t, u) τ 2 } support modular development allow integration of external knowledge sources Question given a class C of transformations, is there n N such that C n = k 1 C k C k = C ; ; C }{{} k times Lecture VI: Tree Transducers in SMT A. Maletti 27

48 l(n)-xmbot l-xtop le-xtop l-xtop 2 le-xtop 2 ln-xtop 2 l-xtop R lne-xtop 2 l-xtop le-xtop ln-xtop lne-xtop Lecture VI: Tree Transducers in SMT A. Maletti 28

49 l-top l-xtop l-xmbot ε-free, strict, nondeleting 1 1 ε-free, strict 2 1 ε-free 2 1 otherwise (without delabeling) 2 1 Lecture VI: Tree Transducers in SMT A. Maletti 29

50 l-top l-xtop l-xmbot ε-free, strict, nondeleting ε-free, strict 2? 1 ε-free 2? 1 otherwise (without delabeling) 2? 1 Lecture VI: Tree Transducers in SMT A. Maletti 29

51 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem [FÜLÖP,, 2013] switch delabeling from back to front: le[s]-xtop R ; l[s]d-top R le[s]-xtop R l[s]d-top R ; lesn-xtop Lecture VI: Tree Transducers in SMT A. Maletti 30

52 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem [FÜLÖP,, 2013] switch delabeling from back to front: le[s]-xtop R ; l[s]d-top R le[s]-xtop R l[s]d-top R ; lesn-xtop Lecture VI: Tree Transducers in SMT A. Maletti 30

53 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem [FÜLÖP,, 2013] switch delabeling from back to front: le[s]-xtop R ; l[s]d-top R le[s]-xtop R l[s]d-top R ; lesn-xtop Notes other transducer becomes strict and nondeleting Lecture VI: Tree Transducers in SMT A. Maletti 30

54 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem [FÜLÖP,, 2013] switch delabeling from back to front: le[s]-xtop R ; l[s]d-top R le[s]-xtop R l[s]d-top R ; lesn-xtop Notes other transducer becomes strict and nondeleting other transducer loses look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 30

55 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem (le[s]-xtop R ) n l[s]d-top R ; lesn-xtop 2 (le[s]-xtop R ) 3 Lecture VI: Tree Transducers in SMT A. Maletti 31

56 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem (le[s]-xtop R ) n l[s]d-top R ; lesn-xtop 2 (le[s]-xtop R ) 3 Proof. (le[s]-xtop R ) n+1 Lecture VI: Tree Transducers in SMT A. Maletti 31

57 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem (le[s]-xtop R ) n l[s]d-top R ; lesn-xtop 2 (le[s]-xtop R ) 3 Proof. (le[s]-xtop R ) n+1 le[s]-xtop R ; l[s]d-top R ; lesn-xtop 2 Lecture VI: Tree Transducers in SMT A. Maletti 31

58 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem (le[s]-xtop R ) n l[s]d-top R ; lesn-xtop 2 (le[s]-xtop R ) 3 Proof. (le[s]-xtop R ) n+1 le[s]-xtop R ; l[s]d-top R ; lesn-xtop 2 l[s]d-top R ; lesn-xtop 3 Lecture VI: Tree Transducers in SMT A. Maletti 31

59 e = ε-free; d = delabeling s = strict; n = nondeleting Theorem (le[s]-xtop R ) n l[s]d-top R ; lesn-xtop 2 (le[s]-xtop R ) 3 Proof. (le[s]-xtop R ) n+1 le[s]-xtop R ; l[s]d-top R ; lesn-xtop 2 l[s]d-top R ; lesn-xtop 3 l[s]d-top R ; lesn-xtop 2 Lecture VI: Tree Transducers in SMT A. Maletti 31

60 Corollary le[s]-xtop n QR ; l[s]d-top ; lesn-xtop 2 le[s]-xtop 4 Lecture VI: Tree Transducers in SMT A. Maletti 32

61 Corollary le[s]-xtop n QR ; l[s]d-top ; lesn-xtop 2 le[s]-xtop 4 Proof. uses only standard encoding of look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 32

62 l-top le-xtop strict, nondeleting 1 2 strict, look-ahead 1 strict 2 look-ahead 1 2 Lecture VI: Tree Transducers in SMT A. Maletti 33

63 l-top le-xtop strict, nondeleting 1 2 strict, look-ahead 1 strict 2 look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 33

64 Theorem delabeling homomorphism moving from front to back: lsd-hom ; les-xtop les-xtop lesn-xtop ; lsd-hom Lecture VI: Tree Transducers in SMT A. Maletti 34

65 Theorem delabeling homomorphism moving from front to back: lsd-hom ; les-xtop les-xtop lesn-xtop ; lsd-hom Notes Lecture VI: Tree Transducers in SMT A. Maletti 34

66 Theorem delabeling homomorphism moving from front to back: lsd-hom ; les-xtop les-xtop lesn-xtop ; lsd-hom Notes other transducer becomes nondeleting Lecture VI: Tree Transducers in SMT A. Maletti 34

67 Theorem delabeling homomorphism moving from front to back: lsd-hom ; les-xtop les-xtop lesn-xtop ; lsd-hom Notes other transducer becomes nondeleting other transducer needs to be strict and have no look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 34

68 Theorem (les-xtop R ) n lesn-xtop ; les-xtop les-xtop 2 Lecture VI: Tree Transducers in SMT A. Maletti 35

69 Theorem (les-xtop R ) n lesn-xtop ; les-xtop les-xtop 2 Proof. (les-xtop R ) n+1 (les-xtop R ) n ; les-xtop Lecture VI: Tree Transducers in SMT A. Maletti 35

70 Theorem (les-xtop R ) n lesn-xtop ; les-xtop les-xtop 2 Proof. (les-xtop R ) n+1 (les-xtop R ) n ; les-xtop lesn-xtop ; lsd-hom ; les-xtop 2 Lecture VI: Tree Transducers in SMT A. Maletti 35

71 Theorem (les-xtop R ) n lesn-xtop ; les-xtop les-xtop 2 Proof. (les-xtop R ) n+1 (les-xtop R ) n ; les-xtop lesn-xtop ; lsd-hom ; les-xtop 2 lesn-xtop 3 ; lsd-hom Lecture VI: Tree Transducers in SMT A. Maletti 35

72 Theorem (les-xtop R ) n lesn-xtop ; les-xtop les-xtop 2 Proof. (les-xtop R ) n+1 (les-xtop R ) n ; les-xtop lesn-xtop ; lsd-hom ; les-xtop 2 lesn-xtop 3 ; lsd-hom lesn-xtop 2 ; lsd-hom Lecture VI: Tree Transducers in SMT A. Maletti 35

73 Theorem (les-xtop R ) n lesn-xtop ; les-xtop les-xtop 2 Proof. (les-xtop R ) n+1 (les-xtop R ) n ; les-xtop lesn-xtop ; lsd-hom ; les-xtop 2 lesn-xtop 3 ; lsd-hom lesn-xtop 2 ; lsd-hom lesn-xtop ; les-xtop R Lecture VI: Tree Transducers in SMT A. Maletti 35

74 Theorem (les-xtop R ) n lesn-xtop ; les-xtop les-xtop 2 Proof. (les-xtop R ) n+1 (les-xtop R ) n ; les-xtop lesn-xtop ; lsd-hom ; les-xtop 2 lesn-xtop 3 ; lsd-hom lesn-xtop 2 ; lsd-hom lesn-xtop ; les-xtop R lesn-xtop ; les-xtop Lecture VI: Tree Transducers in SMT A. Maletti 35

75 l-top le-xtop strict, nondeleting 1 2 strict, look-ahead 1 strict 2 look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 36

76 l-top le-xtop strict, nondeleting 1 2 strict, look-ahead 1 2 strict 2 2 look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 36

77 Definition (Hierarchy properties) A dependency t, D, u is input hierarchical if 1 w 2 w 1 2 (v 1, w 1 ) D with w 1 w 2 for all (v 1, w 1 ), (v 2, w 2 ) D with v 1 < v 2 Lecture VI: Tree Transducers in SMT A. Maletti 37

78 Definition (Hierarchy properties) A dependency t, D, u is input hierarchical if 1 w 2 w 1 2 (v 1, w 1 ) D with w 1 w 2 for all (v 1, w 1 ), (v 2, w 2 ) D with v 1 < v 2 Lecture VI: Tree Transducers in SMT A. Maletti 37

79 Definition (Hierarchy properties) A dependency t, D, u is input hierarchical if 1 w 2 w 1 2 (v 1, w 1 ) D with w 1 w 2 for all (v 1, w 1 ), (v 2, w 2 ) D with v 1 < v 2 strictly input hierarchical if 1 v 1 < v 2 implies w 1 w 2 2 v 1 = v 2 implies w 1 w 2 or w 2 w 1 for all (v 1, w 1 ), (v 2, w 2 ) D Lecture VI: Tree Transducers in SMT A. Maletti 37

80 Definition (Distance properties) A dependency t, D, u is input link-distance bounded by b N if for all (v 1, w 1 ), (v 1 v, w 2 ) D with v > b (v 1 v, w 3 ) D such that v < v and 1 v b Lecture VI: Tree Transducers in SMT A. Maletti 38

81 Definition (Distance properties) A dependency t, D, u is input link-distance bounded by b N if for all (v 1, w 1 ), (v 1 v, w 2 ) D with v > b (v 1 v, w 3 ) D such that v < v and 1 v b Lecture VI: Tree Transducers in SMT A. Maletti 38

82 Definition (Distance properties) A dependency t, D, u is input link-distance bounded by b N if for all (v 1, w 1 ), (v 1 v, w 2 ) D with v > b (v 1 v, w 3 ) D such that v < v and 1 v b strict input link-distance bounded by b if for all v 1, v 1 v pos(t) with v > b (v 1 v, w 3 ) D such that v < v and 1 v b Lecture VI: Tree Transducers in SMT A. Maletti 38

83 hierarchical link-distance bounded Model \ Property input output input output ln-xtop strictly strictly strictly strictly l-xtop R strictly strictly strictly l-mbot strictly strictly Lecture VI: Tree Transducers in SMT A. Maletti 39

84 Theorem [ et al., 2009] les-xtop les-xtop R les-xtop 2 = (les-xtop R ) 2 Lecture VI: Tree Transducers in SMT A. Maletti 40

85 Theorem [ et al., 2009] les-xtop les-xtop R les-xtop 2 = (les-xtop R ) 2 Proof. look-ahead adds power at first level none of the basic classes is closed under composition Lecture VI: Tree Transducers in SMT A. Maletti 40

86 l-top le-xtop strict, nondeleting 1 2 strict, look-ahead 1 2 strict 2 2 look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 41

87 l-top le-xtop strict, nondeleting 1 2 strict, look-ahead 1 2 strict 2 2 look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 41

88 Theorem [FÜLÖP,, 2013] le-xtop 2 (le-xtop R ) 2 le-xtop 3 (le-xtop R ) 3 σ1 σ1 σ1 σ1 t1 t = σ1 c σ2 σ1 σ1 σ2 σ2 t3 t2 = u σ2 σ2 t2 t1 σ1 σ2 ti 1 ti 2 tn tn 1 σ2 ti ti 1 tn σ2 ti+1 ti ti+2 ti+1 tn 1 tn 2 s = v v vi+1 vi vi 1 si+1 si si 1 Lecture VI: Tree Transducers in SMT A. Maletti 42

89 Theorem [FÜLÖP,, 2013] le-xtop 2 (le-xtop R ) 2 le-xtop 3 (le-xtop R ) 3 v v i 1 and v v i and v v i+1 σ1 σ1 σ1 σ1 t1 t = σ1 c σ2 σ1 σ1 σ2 σ2 t3 t2 = u σ2 σ2 t2 t1 σ1 σ2 ti 1 ti 2 tn tn 1 σ2 ti ti 1 tn σ2 ti+1 ti ti+2 ti+1 tn 1 tn 2 s = v v vi+1 vi vi 1 si+1 si si 1 Lecture VI: Tree Transducers in SMT A. Maletti 42

90 Theorem [FÜLÖP,, 2013] le-xtop 2 (le-xtop R ) 2 le-xtop 3 (le-xtop R ) 3 v v i 1 and v v i and v v i+1 σ1 σ1 σ1 σ1 t1 t = σ1 c σ2 σ1 σ1 σ2 σ2 t3 t2 = u σ2 σ2 t2 t1 σ1 σ2 ti 1 ti 2 tn tn 1 σ2 ti ti 1 tn σ2 ti+1 ti ti+2 ti+1 tn 1 tn 2 s = v v vi+1 vi vi 1 si+1 si si 1 Lecture VI: Tree Transducers in SMT A. Maletti 42

91 Theorem [FÜLÖP,, 2013] le-xtop 2 (le-xtop R ) 2 le-xtop 3 (le-xtop R ) 3 v v i 1 and v v i and v v i+1 v v i 1 and v v i and v v i+1 σ1 σ1 σ1 σ1 t1 t = σ1 c σ2 σ1 σ1 σ2 σ2 t3 t2 = u σ2 σ2 t2 t1 σ1 σ2 ti 1 ti 2 tn tn 1 σ2 ti ti 1 tn σ2 ti+1 ti ti+2 ti+1 tn 1 tn 2 s = v v vi+1 vi vi 1 si+1 si si 1 Lecture VI: Tree Transducers in SMT A. Maletti 42

92 l-top le-xtop strict, nondeleting 1 2 strict, look-ahead 1 2 strict 2 2 look-ahead Lecture VI: Tree Transducers in SMT A. Maletti 43

93 l-top le-xtop strict, nondeleting 1 2 strict, look-ahead 1 2 strict 2 2 look-ahead (4) Lecture VI: Tree Transducers in SMT A. Maletti 43

94 l-top l-xtop l-xtop R ε-free, nondeleting 1 strict 2 nondeleting 1 strict, nondeleting 1 2 Proof. completely different technique [FÜLÖP,, 2013] Lecture VI: Tree Transducers in SMT A. Maletti 44

95 l-top l-xtop l-xtop R l-mbot ε-free, strict, nondeleting ε-free, strict ε-free otherwise (w/o delabeling) 2 1 Lecture VI: Tree Transducers in SMT A. Maletti 45

96 l(n)-mbot? l-xtop le-xtop 4? le-xtop 3 l-xtop 2 le-xtop 2 ln-xtop 2 l-xtop R lne-xtop 2 l-xtop le-xtop ln-xtop lne-xtop Lecture VI: Tree Transducers in SMT A. Maletti 46

97 Literature Selected references ARNOLD, DAUCHET: Morphismes et Bimorphismes d Arbres Theoret. Comput. Sci. 20, 1982 BRAUNE, SEEMANN, QUERNHEIM, : Shallow local multi bottom-up tree transducers in statistical machine translation. Proc. ACL, 2013 FÜLÖP, : Composition closure of ε-free linear extended top-down tree transducers. Proc. DLT, 2013, GRAEHL, HOPKINS, KNIGHT: The Power of Extended Top-down Tree Transducers. SIAM J. Comput. 39, 2009 Lecture VI: Tree Transducers in SMT A. Maletti 47

Linking Theorems for Tree Transducers

Linking Theorems for Tree Transducers Andreas Maletti maletti@ims.uni-stuttgart.de peyer October 1, 2015 Andreas Maletti Linking Theorems for MBOT Theorietag 2015 1 / 32 tatistical Machine Translation