Decoding and Inference with Syntactic Translation Models

Size: px

Start display at page:

Download "Decoding and Inference with Syntactic Translation Models"

Rosemary Bradley
5 years ago
Views:

1 Decoding and Inference with Syntactic Translation Models March 5, 2013

2 CFGs S NP VP VP NP V V NP NP

3 CFGs S NP VP S VP NP V V NP NP

4 CFGs S NP VP S VP NP V NP VP V NP NP

5 CFGs S NP VP S VP NP V NP VP V NP NP

6 CFGs S NP VP S VP NP V NP VP V NP V NP NP

7 CFGs S NP VP S VP NP V NP VP V NP V NP NP

8 CFGs S NP VP S VP NP V NP VP V NP V NP NP

9 CFGs S NP VP S VP NP V NP VP V NP V NP NP Output:

10 Synchronous CFGs S NP VP VP NP V V NP NP

11 Synchronous CFGs S NP VP : 1 2 (monotonic) VP NP V : 2 1 (inverted) V : ate NP : John NP : an apple

12 Synchronous CFGs S NP VP : 1 2 (monotonic) VP NP V : 2 1 (inverted) V : ate NP : John NP : an apple

13 Synchronous generation

14 Synchronous generation S S

15 Synchronous generation S S NP VP NP VP

16 Synchronous generation S S NP VP NP VP John

17 Synchronous generation S S NP VP NP VP NP V John V NP

18 Synchronous generation S S NP VP NP VP NP V John V NP an apple

19 Synchronous generation S S NP VP NP VP NP V John V NP ate an apple

20 Synchronous generation S S NP VP NP VP NP V John V NP ate an apple Output: ( : John ate an apple)

21 Translation as parsing Parse source

22 Translation as parsing Parse source NP

23 Translation as parsing Parse source NP NP

24 Translation as parsing Parse source NP NP V

25 Translation as parsing Parse source VP NP NP V

26 Translation as parsing Parse source S VP NP NP V

27 Translation as parsing Parse source Project to target S S VP VP NP NP V NP V NP John ate an apple

28 A closer look at parsing Parsing is usually done with dynamic programming Share common computations and structure Represent exponential number of alternatives in polynomial space With SCFGs there are two kinds of ambiguity source parse ambiguity translation ambiguity parse forests can represent both!

29 A closer look at parsing Any monolingual parser can be used (most often: CKY / CKY variants) Parsing complexity is O( n3 ) cubic in the length of the sentence (n3 ) cubic in the number of non-terminals ( G 3 ) adding nonterminal types increases parsing complexity substantially! With few NTs, exhaustive parsing is tractable

30 Parsing as deduction Antecedents Side conditions A : u C : w B : v Consequent If A and B are true with weights u and v, and phi is also true, then C is true with weight w.

31 Example: CKY Inputs: f = hf 1,f 2,...,f`i G Context-free grammar in Chomsky normal form. Item form: [, i, j] A subtree rooted with NT type spanning i to j has been recognized.

32 Example: CKY Goal: [S, 0, ] Axioms: [, i 1,i]:w (! f i) 2 G w Inference rules: [, i, k] :u [Y,k,j]:v [Z, i, j] :u v w w (Z! Y ) 2 G

33 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

34 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

35 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

36 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

37 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

38 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

39 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

40 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN SBAR 2,4 V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

41 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN VP,1,4 V NN SBAR 2,4 V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

42 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN VP,1,4 V NN SBAR 2,4 V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

43 S PRP VP VP V NP S,0,4 VP V SBAR SBAR PRP V NP PRP NN VP,1,4 V NN SBAR 2,4 V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

44 S,0,4 What is this object? VP,1,4 SBAR 2,4 NP,2,4 V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck

45 Semantics of hypergraphs Generalization of directed graphs Special node designated the goal Every edge has a single head and 0 or more tails (the arity of the edge is the number of tails) Node labels correspond to LHS s of CFG rules A derivation is the generalization of the graph concept of path to hypergraphs Weights multiply along edges in the derivation, and add at nodes (cf. semiring parsing)

46 Edge labels Edge labels may be a mix of terminals and substitution sites (non-terminals) In translation hypergraphs, edges are labeled in both the source and target languages The number of substitution sites must be equal to the arity of the edge and must be the same in both languages The two languages may have different orders of the substitution sites There is no restriction on the number of terminal symbols

47 Edge labels la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 {( : yesterday s reading ), ( : reading from yesterday) }

48 Inference algorithms Viterbi O( E + V ) Find the maximum weighted derivation Requires a partial ordering of weights Inside - outside O( E + V ) Compute the marginal (sum) weight of all derivations passing through each edge/node k-best derivations O( E + D max k log k) Enumerate the k-best derivations in the hypergraph See IWPT paper by Huang and Chiang (2005)

49 Things to keep in mind Bound on the number of edges: E O(n 3 G 3 ) Bound on the number of nodes: V O(n 2 G )

50 Decoding Again Translation hypergraphs are a lingua franca for translation search spaces Note that FST lattices are a special case Decoding problem: how do I build a translation hypergraph?

51 Representational limits Consider this very simple SCFG translation model: Glue rules: S S S : 1 2 S S S : 2 1

52 Representational limits Consider this very simple SCFG translation model: Glue rules: S S S : 1 2 S S S : 2 1 Lexical rules: S S S : : : ate John an apple

53 Representational limits Phrase-based decoding runs in exponential time All permutations of the source are modeled (traveling salesman problem!) Typically distortion limits are used to mitigate this But parsing is polynomial...what s going on?

54 Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C

55 Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can t we binarize any grammar?

56 Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can t we binarize any grammar? No. Synchronous CFGs cannot generally be binarized!

57 Does this matter? The forbidden pattern is observed in real data (Melamed, 2003) Does this matter? Learning Phrasal units and higher rank grammars can account for the pattern Sentences can be simplified or ignored Translation The pattern does exist, but how often must it exist (i.e., is there a good translation that doesn t violate the SCFG matching property)?

58 Tree-to-string How do we generate a hypergraph for a tree-tostring translation model? Simple linear-time (given a fixed translation model) top-down matching algorithm Recursively cover uncovered sites in tree Each node in the input tree becomes a node in the translation forest For details, Huang et al. (AMTA, 2006) and Huang et al. (EMNLP, 2010)

59 S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple }Tree-to-string grammar jon-ga! John

60 S VP NP NP V S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple jon-ga! John

61 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple jon-ga! John an apple NP 2 ate V

62 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple jon-ga! John an apple NP 2 ate V

63 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple jon-ga! John an apple NP 2 ate V

64 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 NP 2 V VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate an apple ate ringo-o! an apple jon-ga! John

65 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 NP 2 V VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate an apple ate ringo-o! an apple jon-ga! John

66 Language Models

67 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 Goal node Source label Target label

68 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 Substitution sites / variables / non-terminals

69 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 For LM integration, we ignore the source!

70 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 For LM integration, we ignore the source!

71 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 {( yesterday s reading ), ( reading from yesterday) } How can we add the LM score to each string derived by the hypergraph?

72 LM Integration If LM features were purely local... Unigram model Discriminative LM... integration would be a breeze Add an LM feature to every edge But, LM features are non-local!

73 Why is it hard? de : 1 saw 2 Two problems: 1. What is the content of the variables?

74 Why is it hard? Two problems: de : 1 saw 2 John Mary 1. What is the content of the variables?

75 Why is it hard? Two problems: de : 1 saw 2 I think I Mary 1. What is the content of the variables?

76 Why is it hard? Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables?

77 Why is it hard? Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables? 2. What will be the left context when this string is substituted somewhere?

78 Why is it hard? <s> 1 </s> S Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables? 2. What will be the left context when this string is substituted somewhere?

79 Why is it hard? fortunately, 1 Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables? 2. What will be the left context when this string is substituted somewhere?

80 Why is it hard? 2 1 Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables? 2. What will be the left context when this string is substituted somewhere?

81 Naive solution Extract the all (k-best?) translations from the translation model Score them with an LM What s the problem with this?

82 Outline of DP solution Use n-order Markov assumption to help us In an n-gram LM, words more than n words away will not affect the local (conditional) probability of a word in context This is not generally true, just the Markov assumption! General approach Restructure the hypergraph so that LM probabilities decompose along edges. Solves both problems we will not know the full value of variables, but we will know enough. defer scoring of left context until the context is established.

83 Hypergraph restructuring

84 Hypergraph restructuring Note the following three facts: If you know n or more consecutive words, the conditional probabilities of the nth, (n+1)th,... words can be computed. Therefore: add a feature weight to the edge for words.

85 Hypergraph restructuring Note the following three facts: If you know n or more consecutive words, the conditional probabilities of the nth, (n+1)th,... words can be computed. Therefore: add a feature weight to the edge for words. (n-1) words of context to the left is enough to determine the probability of any word Therefore: split nodes based on the (n-1) words on the right side of the span dominated by every node

86 Hypergraph restructuring Note the following three facts: If you know n or more consecutive words, the conditional probabilities of the nth, (n+1)th,... words can be computed. Therefore: add a feature weight to the edge for words. (n-1) words of context to the left is enough to determine the probability of any word Therefore: split nodes based on the (n-1) words on the right side of the span dominated by every node (n-1) words on the left side of a span cannot be scored with certainty because the context is not known Therefore: split nodes based on the (n-1) words on the left side of the span dominated by every node

87 Hypergraph restructuring Note the following three facts: If you know n or more consecutive words, the conditional probabilities of the nth, (n+1)th,... words can be computed. Therefore: add a feature weight to the edge for words. (n-1) words of context to the left is enough to determine the probability of any word Split nodes by the (n-1) words on both Therefore: sides split of nodes the based convergent the (n-1) words on edges. the right side of the span dominated by every node (n-1) words on the left side of a span cannot be scored with certainty because the context is not known Therefore: split nodes based on the (n-1) words on the left side of the span dominated by every node

88 Hypergraph restructuring Algorithm ( cube intersection ): For each node v (proceeding in topological order through the nodes) For each edge e with head-node v, compute the (n-1) words on the left and right; call this qe Do this by substituting the (n-1)x2 word string from the tail node corresponding to the substitution variable If node vq e does not exist, create it, duplicating all outgoing edges from v so that they also proceed from vqe Disconnect e from v and attach it to vq e Delete v

89 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha the stain the gray stain de 1 : 21 from 2 0.4

90 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha the stain the gray stain de 1 : 21 from LM Viterbi: the stain s the man

91 Hypergraph restructuring Let s add a bi-gram language model! 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha the stain the gray stain de 1 : 21 from 2 0.4

92 Hypergraph restructuring Let s add a bi-gram language model! 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha the stain the gray stain de 1 : 21 from 2 0.4

93 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband p(mancha la) 0.1 la mancha the stain the gray stain de 1 : 21 from 2 0.4

94 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband p(mancha la) 0.1 la mancha la mancha the stain the gray stain de 1 : 21 from 2 0.4

95 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband p(stain the) la mancha the stain the gray stain la mancha de 1 : 21 from 2 0.4

96 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband p(stain the) la mancha the stain the gray stain la mancha de 1 : 21 from the stain

97 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha la mancha p(gray the) x p(stain gray) the stain the gray stain de 1 : 21 from the stain

98 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha la mancha p(gray the) x p(stain gray) the stain the gray stain de 1 : 21 from the stain

99 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha la mancha the stain the gray stain de 1 : 21 from the stain

100 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband la mancha the stain the gray stain la mancha de 1 : 21 from the stain

101 Hypergraph restructuring 0.6 the man the man 0.4 the husband the de : 's husband la mancha the stain the gray stain la mancha de 1 : 21 from the stain

102 Hypergraph restructuring 0.6 the man the man 0.4 de : 's the husband Every node the husband remembers enough for 0.1edges la mancha to compute LM costs 0.7 the stain la mancha de 1 : 21 from the gray stain 0.6 the stain

103 Complexity What is the run-time of this algorithm?

104 Complexity What is the run-time of this algorithm? O( V E 2(n 1) ) Going to longer n-grams is exponentially expensive!

105 Cube pruning Expanding every node like this exhaustively is impractical Polynomial time, but really, really big! Cube pruning: minor tweak on the above algorithm Compute the k-best expansions at each node Use an estimate (usually a unigram probability) of the unscored left-edge to rank the nodes

106 Cube pruning Why cube pruning? Cube-pruning only involves a cube when arity-2 rules are used! More appropriately called square pruning with arity-1 Or hypercube pruning with arity > 2!

107 Cube Pruning (VP VP1, 6 PP1, 3 VP3, 6 monotonic grid? held meeting 3,6 ) (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 12

108 Cube Pruning VP1, 6 PP1, 3 VP3, 6 (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP non-monotonic grid due to LM combo costs with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 13

109 Cube Pruning VP1, 6 PP1, 3 VP3, 6 bigram (meeting, with) (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP non-monotonic grid due to LM combo costs with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 13

110 Cube Pruning (VP VP1, 6 PP1, 3 VP3, 6 non-monotonic grid due to LM combo costs held meeting 3,6 ) (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 14

111 Cube Pruning k-best parsing (Huang and Chiang, 2005) a priority queue of candidates extract the best candidate (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 15

112 k-best parsing (Huang and Chiang, 2005) a priority queue of candidates extract the best candidate push the two successors (VP held meeting 3,6 ) Cube Pruning (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 16

113 k-best parsing (Huang and Chiang, 2005) a priority queue of candidates extract the best candidate push the two successors (VP held meeting 3,6 ) Cube Pruning (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 17

114 Cube pruning Widely used for phrase-based and syntax-based MT May be applied in conjunction with a bottom-up decoder, or as a second rescoring pass Nodes may also be grouped together (for example, all nodes corresponding to a certain source span) Requirement for topological ordering means translation hypergraph may not have cycles

115 Alignment

116 Hypergraphs as Grammars A hypergraph is isomorphic to a (synchronous) CFG LM integration can be understood as the intersection of an LM and an CFG Cube pruning approximates this intersection Is the algorithm optimal?

117 Constrained decoding Wu (1997) gives an algorithm that is a generalization of CKY to SCFGs Requires an approximation of CNF Alternative solution: Parse in one language Then parse the other side of the string pair the with hypergraph grammar

118 Input: <dianzi shiang de mao, a cat on the mat> With thanks and apologies to Zhifei Li.

119 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 S 0, 0 0,4 the cat S 0, 0 0,4 a mat 0 de 1, 1 on 0 0 de 1, de 1, 0 s 1 0 de 1, 1 of 0 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3 With thanks and apologies to Zhifei Li.

120 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 Isomorphic CFG S 0, 0 0,4 the cat S 0, 0 0,4 a mat [34]! a cat 0 de 1, 1 on 0 0 de 1, de 1, 0 s 1 0 de 1, 1 of 0 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3

121 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 Isomorphic CFG S 0, 0 0,4 the cat S 0, 0 0,4 a mat [34]! a cat [02]! the mat 0 de 1, 1 on 0 0 de 1, de 1, 0 s 1 0 de 1, 1 of 0 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3

122 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 S S 0, 0 0,4 the cat 0, 0 0,4 a mat 0 de 1, 1 on 0 0 de 1, 0 s 1 0 de 1, 1 of 0 0 de 1, 0 1 Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3

123 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 S S 0, 0 0,4 the cat 0, 0 0,4 a mat 0 de 1, 1 on 0 0 de 1, 0 s 1 0 de 1, 1 of 0 0 de 1, 0 1 Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3

124 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 S S 0, 0 0,4 the cat 0, 0 0,4 a mat 0 de 1, 1 on 0 0 de 1, 0 s 1 0 de 1, 1 of 0 0 de 1, 0 1 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] dianzi0 shang1 de2 mao3

125 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b]

126 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] a cat on the mat

127 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] [34] a cat on the mat

128 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] [34] [02] a cat on the mat

129 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] [34] [04a] [02] a cat on the mat

130 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] [34] [S] [04a] [02] a cat on the mat

131 seconds / sentence ITG algorithm Two-parse algorithm source length (words)

Aspects of Tree-Based Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based