Syntax-Based Decoding

Size: px

Start display at page:

Download "Syntax-Based Decoding"

August Goodman
5 years ago
Views:

1 Syntax-Based Decoding Philipp Koehn 9 November 2017

2 1 syntax-based models

3 Synchronous Context Free Grammar Rules 2 Nonterminal rules NP DET 1 2 JJ 3 DET 1 JJ 3 2 Terminal rules N maison house NP la maison bleue the blue house Mixed rules NP la maison JJ 1 the JJ 1 house

4 Extracting Minimal Rules 3 S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT S I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: S X 1 X 2 PRP 1 VP 2 DONE note: one rule per alignable constituent

5 4 decoding

6 Syntactic Decoding 5 Inspired by monolingual syntactic chart parsing: During decoding of the source sentence, a chart with translations for the O(n 2 ) spans has to be filled Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP

7 Syntax Decoding 6 VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP German input sentence with tree

8 Syntax Decoding 7 PRO she VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Purely lexical rule: filling a span with a translation (a constituent in the chart)

9 Syntax Decoding 8 PRO she coffee VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Purely lexical rule: filling a span with a translation (a constituent in the chart)

10 Syntax Decoding 9 coffee PRO she VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Purely lexical rule: filling a span with a translation (a constituent in the chart)

11 Syntax Decoding 10 NP NP PP PRO she DET a cup coffee IN of VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Complex rule: matching underlying constituent spans, and covering words

12 Syntax Decoding 11 VBZ wants VP TO to VP VB NP NP NP PP PRO she DET a cup coffee IN of VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Complex rule with reordering

13 Syntax Decoding PRO VP S 12 VP VP VBZ wants TO to VB NP NP NP PP PRO she DET a cup IN of coffee VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP

14 Bottom-Up Decoding 13 For each span, a stack of (partial) translations is maintained Bottom-up: a higher stack is filled, once underlying stacks are complete

15 Chart Organization 14 Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Chart consists of cells that cover contiguous spans over the input sentence Each cell contains a set of hypotheses 1 Hypothesis = translation of span with target-side constituent 1 In the book, they are called chart entries.

16 Naive Algorithm 15 Input: Foreign sentence f = f 1,...f lf, with syntax tree Output: English translation e 1: for all spans [start,end] (bottom up) do 2: for all sequences s of hypotheses and words in span [start,end] do 3: for all rules r do 4: if rule r applies to chart sequence s then 5: create new hypothesis c 6: add hypothesis c to chart 7: end if 8: end for 9: end for 10: end for 11: return English translation e from best hypothesis in span [0,l f ]

17 Stack Pruning 16 Number of hypotheses in each chart cell explodes Dynamic programming (recombination) not enough need to discard bad hypotheses e.g., keep 100 best only Different stacks for different output constituent labels? Cost estimates translation model cost known language model cost for internal words known estimates for initial words outside cost estimate? (how useful will be a NP covering input words 3 5 later on?)

18 Naive Algorithm: Blow-ups 17 Many subspan sequences for all sequences s of hypotheses and words in span [start,end] Many rules for all rules r Checking if a rule applies not trivial rule r applies to chart sequence s Unworkable

19 Solution 18 Prefix tree data structure for rules Dotted rules Cube pruning

20 19 storing rules efficiently

21 Storing Rules 20 First concern: do they apply to span? have to match available hypotheses and input words Example rule Check for applicability NP X 1 des X 2 NP 1 of the 2 is there an initial sub-span that with a hypothesis with constituent label NP? is it followed by a sub-span over the word des? is it followed by a final sub-span with a hypothesis with label? Sequence of relevant information NP des NP 1 of the 2

22 Rule Applicability Check 21 Trying to cover a span of six words with given rule NP des NP: NP of the

23 Rule Applicability Check 22 First: check for hypotheses with output constituent label NP NP des NP: NP of the

24 Rule Applicability Check 23 Found NP hypothesis in cell, matched first symbol of rule NP des NP: NP of the NP

25 Rule Applicability Check 24 Matched word des, matched second symbol of rule NP des NP: NP of the NP

26 Rule Applicability Check 25 Found a hypothesis in cell, matched last symbol of rule NP des NP: NP of the NP

27 Rule Applicability Check 26 Matched entire rule apply to create a NP hypothesis NP des NP: NP of the NP NP

28 Rule Applicability Check 27 Look up output words to create new hypothesis (note: there may be many matching underlying NP and hypotheses) NP des NP: NP of the NP: the house of the architect Frank Gehry NP: the house : architect Frank Gehry

29 Checking Rules vs. Finding Rules 28 What we showed: given a rule check if and how it can be applied But there are too many rules (millions) to check them all Instead: given the underlying chart cells and input words find which rules apply

30 Prefix Tree for Rules 29 NP DET NP NP: NP1... NP: NP1 IN2 NP3 NP: NP1 of DET2 NP3 NP: NP1 of IN2 NP3 PP VP des um VP NP: NP1 of the 2 NP: NP2 NP1 NP: NP1 of NP2... DET NP: DET1 2 das Haus NP: the house Highlighted Rules NP NP 1 DET 2 3 NP 1 IN 2 3 NP NP 1 NP 1 NP NP 1 des 2 NP 1 of the 2 NP NP 1 des 2 NP 2 NP 1 NP DET 1 2 DET 1 2 NP das Haus the house

31 30 dotted rules

32 Dotted Rules: Key Insight 31 If we can apply a rule like p A B C x to a span Then we could have applied a rule like q A B y to a sub-span with the same starting word We can re-use rule lookup by storing A B (dotted rule)

33 Finding Applicable Rules in Prefix Tree 32

34 Covering the First Cell 33

35 Looking up Rules in the Prefix Tree 34

36 Taking Note of the Dotted Rule 35

37 Checking if Dotted Rule has Translations 36 DET: the DET: that

38 Applying the Translation Rules 37 DET: the DET: that DET: that DET: the

39 Looking up Constituent Label in Prefix Tree 38 DET: that DET: the

40 Add to Span s List of Dotted Rules 39 DET: that DET: the

41 Moving on to the Next Cell 40 DET: that DET: the

42 Looking up Rules in the Prefix Tree 41 Haus ❸ DET: that DET: the

43 Taking Note of the Dotted Rule 42 Haus ❸ DET: that DET: the house ❸

44 Checking if Dotted Rule has Translations 43 Haus ❸ : house NP: house DET: that DET: the house ❸

45 Applying the Translation Rules 44 Haus ❸ : house NP: house DET: that DET: the NP: house : house house ❸

46 Looking up Constituent Label in Prefix Tree 45 Haus ❸ ❹ NP ❺ DET: that DET: the NP: house : house house ❸

47 Add to Span s List of Dotted Rules 46 Haus ❸ ❹ NP ❺ DET: that DET: the NP: house : house ❹ NP ❺ house ❸

48 More of the Same 47 Haus ❸ ❹ NP ❺ DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

49 Moving on to the Next Cell 48 Haus ❸ ❹ NP ❺ DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

50 Covering a Longer Span 49 Cannot consume multiple words at once All rules are extensions of existing dotted rules Here: only extensions of span over das possible DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

51 Extensions of Span over das 50 Haus ❸ ❹ NP ❺, NP, Haus?, NP, Haus? DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

52 Looking up Rules in the Prefix Tree 51 Haus ❻ ❼ Haus ❽ ❾ DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

53 Taking Note of the Dotted Rule 52 Haus ❻ ❼ Haus ❽ ❾ DET ❾ DET Haus❽ das ❼ das Haus❻ DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

54 Checking if Dotted Rules have Translations 53 Haus ❻ NP: the house ❼ NP: the Haus ❽ NP: DET house ❾ NP: DET DET ❾ DET Haus❽ das ❼ das Haus❻ DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

55 Applying the Translation Rules 54 Haus ❻ NP: the house ❼ NP: the Haus ❽ NP: DET house ❾ NP: DET NP: that house NP: the house DET ❾ DET Haus❽ das ❼ das Haus❻ DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

56 Looking up Constituent Label in Prefix Tree 55 Haus ❻ NP: the house ❼ NP: the Haus ❽ NP: DET house ❾ NP: DET NP ❺ NP: that house NP: the house DET ❾ DET Haus❽ das ❼ das Haus❻ DET: that DET: the NP: house : house IN: of DET: the NP: architect : architect P: Frank P: Gehry ❹ NP ❺ house ❸ des ❹ Architekten P Frank P Gehry

57 Add to Span s List of Dotted Rules 56 Haus ❻ NP: the house ❼ NP: the Haus ❽ NP: DET house ❾ NP: DET NP ❺ NP: that house NP: the house DET ❾ DET Haus❽ das ❼ das Haus❻ NP❺ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry

58 Even Larger Spans 57 Extend lists of dotted rules with cell constituent labels span s dotted rule list (with same start) plus neighboring span s constituent labels of hypotheses (with same end)

59 Reflections 58 Complexity O(rn 3 ) with sentence length n and size of dotted rule list r may introduce maximum size for spans that do not start at beginning may limit size of dotted rule list (very arbitrary) Does the list of dotted rules explode? Yes, if there are many rules with neighboring target-side non-terminals such rules apply in many places rules with words are much more restricted

60 Difficult Rules 59 Some rules may apply in too many ways Neighboring input non-terminals NP X 1 X 2 NP 2 to NP 1 non-terminals may match many different pairs of spans especially a problem for hierarchical models (no constituent label restrictions) may be okay for syntax-models Three neighboring input non-terminals VP trifft X 1 X 2 X 3 heute meets NP 1 today PP 2 PP 3 will get out of hand even for syntax models

61 Summary 60 Basic idea: bottom up chart parsing Prefix structure for easy rule access Caching rule matching with dotted rules Coming up... cube pruning for syntax-based decoding recombination and state scope3 pruning recursive cky+ coarse-to-fine

Statistical Machine Translation

Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding