Decoding and Inference with Syntactic Translation Models

Size: px
Start display at page:

Download "Decoding and Inference with Syntactic Translation Models"

Transcription

1 Decoding and Inference with Syntactic Translation Models March 5, 2013

2 CFGs S NP VP VP NP V V NP NP

3 CFGs S NP VP S VP NP V V NP NP

4 CFGs S NP VP S VP NP V NP VP V NP NP

5 CFGs S NP VP S VP NP V NP VP V NP NP

6 CFGs S NP VP S VP NP V NP VP V NP V NP NP

7 CFGs S NP VP S VP NP V NP VP V NP V NP NP

8 CFGs S NP VP S VP NP V NP VP V NP V NP NP

9 CFGs S NP VP S VP NP V NP VP V NP V NP NP Output:

10 Synchronous CFGs S NP VP VP NP V V NP NP

11 Synchronous CFGs S NP VP : 1 2 (monotonic) VP NP V : 2 1 (inverted) V : ate NP : John NP : an apple

12 Synchronous CFGs S NP VP : 1 2 (monotonic) VP NP V : 2 1 (inverted) V : ate NP : John NP : an apple

13 Synchronous generation

14 Synchronous generation S S

15 Synchronous generation S S NP VP NP VP

16 Synchronous generation S S NP VP NP VP John

17 Synchronous generation S S NP VP NP VP NP V John V NP

18 Synchronous generation S S NP VP NP VP NP V John V NP an apple

19 Synchronous generation S S NP VP NP VP NP V John V NP ate an apple

20 Synchronous generation S S NP VP NP VP NP V John V NP ate an apple Output: ( : John ate an apple)

21 Translation as parsing Parse source

22 Translation as parsing Parse source NP

23 Translation as parsing Parse source NP NP

24 Translation as parsing Parse source NP NP V

25 Translation as parsing Parse source VP NP NP V

26 Translation as parsing Parse source S VP NP NP V

27 Translation as parsing Parse source Project to target S S VP VP NP NP V NP V NP John ate an apple

28 A closer look at parsing Parsing is usually done with dynamic programming Share common computations and structure Represent exponential number of alternatives in polynomial space With SCFGs there are two kinds of ambiguity source parse ambiguity translation ambiguity parse forests can represent both!

29 A closer look at parsing Any monolingual parser can be used (most often: CKY / CKY variants) Parsing complexity is O( n3 ) cubic in the length of the sentence (n3 ) cubic in the number of non-terminals ( G 3 ) adding nonterminal types increases parsing complexity substantially! With few NTs, exhaustive parsing is tractable

30 Parsing as deduction Antecedents Side conditions A : u C : w B : v Consequent If A and B are true with weights u and v, and phi is also true, then C is true with weight w.

31 Example: CKY Inputs: f = hf 1,f 2,...,f`i G Context-free grammar in Chomsky normal form. Item form: [, i, j] A subtree rooted with NT type spanning i to j has been recognized.

32 Example: CKY Goal: [S, 0, ] Axioms: [, i 1,i]:w (! f i) 2 G w Inference rules: [, i, k] :u [Y,k,j]:v [Z, i, j] :u v w w (Z! Y ) 2 G

33 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

34 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

35 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

36 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

37 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

38 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

39 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

40 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN V NN SBAR 2,4 V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

41 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN VP,1,4 V NN SBAR 2,4 V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

42 S PRP VP VP V NP VP V SBAR SBAR PRP V NP PRP NN VP,1,4 V NN SBAR 2,4 V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

43 S PRP VP VP V NP S,0,4 VP V SBAR SBAR PRP V NP PRP NN VP,1,4 V NN SBAR 2,4 V NP,2,4 PRP PRP PRP,0,1 V,1,2 PRP,2,3 NN,3,4 V,3,4 I saw her duck

44 S,0,4 What is this object? VP,1,4 SBAR 2,4 NP,2,4 V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck

45 Semantics of hypergraphs Generalization of directed graphs Special node designated the goal Every edge has a single head and 0 or more tails (the arity of the edge is the number of tails) Node labels correspond to LHS s of CFG rules A derivation is the generalization of the graph concept of path to hypergraphs Weights multiply along edges in the derivation, and add at nodes (cf. semiring parsing)

46 Edge labels Edge labels may be a mix of terminals and substitution sites (non-terminals) In translation hypergraphs, edges are labeled in both the source and target languages The number of substitution sites must be equal to the arity of the edge and must be the same in both languages The two languages may have different orders of the substitution sites There is no restriction on the number of terminal symbols

47 Edge labels la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 {( : yesterday s reading ), ( : reading from yesterday) }

48 Inference algorithms Viterbi O( E + V ) Find the maximum weighted derivation Requires a partial ordering of weights Inside - outside O( E + V ) Compute the marginal (sum) weight of all derivations passing through each edge/node k-best derivations O( E + D max k log k) Enumerate the k-best derivations in the hypergraph See IWPT paper by Huang and Chiang (2005)

49 Things to keep in mind Bound on the number of edges: E O(n 3 G 3 ) Bound on the number of nodes: V O(n 2 G )

50 Decoding Again Translation hypergraphs are a lingua franca for translation search spaces Note that FST lattices are a special case Decoding problem: how do I build a translation hypergraph?

51 Representational limits Consider this very simple SCFG translation model: Glue rules: S S S : 1 2 S S S : 2 1

52 Representational limits Consider this very simple SCFG translation model: Glue rules: S S S : 1 2 S S S : 2 1 Lexical rules: S S S : : : ate John an apple

53 Representational limits Phrase-based decoding runs in exponential time All permutations of the source are modeled (traveling salesman problem!) Typically distortion limits are used to mitigate this But parsing is polynomial...what s going on?

54 Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C

55 Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can t we binarize any grammar?

56 Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can t we binarize any grammar? No. Synchronous CFGs cannot generally be binarized!

57 Does this matter? The forbidden pattern is observed in real data (Melamed, 2003) Does this matter? Learning Phrasal units and higher rank grammars can account for the pattern Sentences can be simplified or ignored Translation The pattern does exist, but how often must it exist (i.e., is there a good translation that doesn t violate the SCFG matching property)?

58 Tree-to-string How do we generate a hypergraph for a tree-tostring translation model? Simple linear-time (given a fixed translation model) top-down matching algorithm Recursively cover uncovered sites in tree Each node in the input tree becomes a node in the translation forest For details, Huang et al. (AMTA, 2006) and Huang et al. (EMNLP, 2010)

59 S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple }Tree-to-string grammar jon-ga! John

60 S VP NP NP V S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple jon-ga! John

61 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple jon-ga! John an apple NP 2 ate V

62 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple jon-ga! John an apple NP 2 ate V

63 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate ringo-o! an apple jon-ga! John an apple NP 2 ate V

64 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 NP 2 V VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate an apple ate ringo-o! an apple jon-ga! John

65 S VP S 1 2 NP NP V NP 1 VP John 2 1 S(x 1 :NP x 2 :VP)! x 1 x 2 NP 2 V VP(x 1 :NP x 2 :V)! x 2 x 1 tabeta! ate an apple ate ringo-o! an apple jon-ga! John

66 Language Models

67 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 Goal node Source label Target label

68 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 Substitution sites / variables / non-terminals

69 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 For LM integration, we ignore the source!

70 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 For LM integration, we ignore the source!

71 Hypergraph review la lectura : reading 1 de 2 : 2 's 1 S ayer : yesterday 1 de 2 : 1 from 2 {( yesterday s reading ), ( reading from yesterday) } How can we add the LM score to each string derived by the hypergraph?

72 LM Integration If LM features were purely local... Unigram model Discriminative LM... integration would be a breeze Add an LM feature to every edge But, LM features are non-local!

73 Why is it hard? de : 1 saw 2 Two problems: 1. What is the content of the variables?

74 Why is it hard? Two problems: de : 1 saw 2 John Mary 1. What is the content of the variables?

75 Why is it hard? Two problems: de : 1 saw 2 I think I Mary 1. What is the content of the variables?

76 Why is it hard? Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables?

77 Why is it hard? Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables? 2. What will be the left context when this string is substituted somewhere?

78 Why is it hard? <s> 1 </s> S Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables? 2. What will be the left context when this string is substituted somewhere?

79 Why is it hard? fortunately, 1 Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables? 2. What will be the left context when this string is substituted somewhere?

80 Why is it hard? 2 1 Two problems: de : 1 saw 2 I think I somebody 1. What is the content of the variables? 2. What will be the left context when this string is substituted somewhere?

81 Naive solution Extract the all (k-best?) translations from the translation model Score them with an LM What s the problem with this?

82 Outline of DP solution Use n-order Markov assumption to help us In an n-gram LM, words more than n words away will not affect the local (conditional) probability of a word in context This is not generally true, just the Markov assumption! General approach Restructure the hypergraph so that LM probabilities decompose along edges. Solves both problems we will not know the full value of variables, but we will know enough. defer scoring of left context until the context is established.

83 Hypergraph restructuring

84 Hypergraph restructuring Note the following three facts: If you know n or more consecutive words, the conditional probabilities of the nth, (n+1)th,... words can be computed. Therefore: add a feature weight to the edge for words.

85 Hypergraph restructuring Note the following three facts: If you know n or more consecutive words, the conditional probabilities of the nth, (n+1)th,... words can be computed. Therefore: add a feature weight to the edge for words. (n-1) words of context to the left is enough to determine the probability of any word Therefore: split nodes based on the (n-1) words on the right side of the span dominated by every node

86 Hypergraph restructuring Note the following three facts: If you know n or more consecutive words, the conditional probabilities of the nth, (n+1)th,... words can be computed. Therefore: add a feature weight to the edge for words. (n-1) words of context to the left is enough to determine the probability of any word Therefore: split nodes based on the (n-1) words on the right side of the span dominated by every node (n-1) words on the left side of a span cannot be scored with certainty because the context is not known Therefore: split nodes based on the (n-1) words on the left side of the span dominated by every node

87 Hypergraph restructuring Note the following three facts: If you know n or more consecutive words, the conditional probabilities of the nth, (n+1)th,... words can be computed. Therefore: add a feature weight to the edge for words. (n-1) words of context to the left is enough to determine the probability of any word Split nodes by the (n-1) words on both Therefore: sides split of nodes the based convergent the (n-1) words on edges. the right side of the span dominated by every node (n-1) words on the left side of a span cannot be scored with certainty because the context is not known Therefore: split nodes based on the (n-1) words on the left side of the span dominated by every node

88 Hypergraph restructuring Algorithm ( cube intersection ): For each node v (proceeding in topological order through the nodes) For each edge e with head-node v, compute the (n-1) words on the left and right; call this qe Do this by substituting the (n-1)x2 word string from the tail node corresponding to the substitution variable If node vq e does not exist, create it, duplicating all outgoing edges from v so that they also proceed from vqe Disconnect e from v and attach it to vq e Delete v

89 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha the stain the gray stain de 1 : 21 from 2 0.4

90 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha the stain the gray stain de 1 : 21 from LM Viterbi: the stain s the man

91 Hypergraph restructuring Let s add a bi-gram language model! 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha the stain the gray stain de 1 : 21 from 2 0.4

92 Hypergraph restructuring Let s add a bi-gram language model! 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha the stain the gray stain de 1 : 21 from 2 0.4

93 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband p(mancha la) 0.1 la mancha the stain the gray stain de 1 : 21 from 2 0.4

94 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband p(mancha la) 0.1 la mancha la mancha the stain the gray stain de 1 : 21 from 2 0.4

95 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband p(stain the) la mancha the stain the gray stain la mancha de 1 : 21 from 2 0.4

96 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband p(stain the) la mancha the stain the gray stain la mancha de 1 : 21 from the stain

97 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha la mancha p(gray the) x p(stain gray) the stain the gray stain de 1 : 21 from the stain

98 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha la mancha p(gray the) x p(stain gray) the stain the gray stain de 1 : 21 from the stain

99 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband 0.1 la mancha la mancha the stain the gray stain de 1 : 21 from the stain

100 Hypergraph restructuring 0.6 the man 1 de 2 : 2 's the husband la mancha the stain the gray stain la mancha de 1 : 21 from the stain

101 Hypergraph restructuring 0.6 the man the man 0.4 the husband the de : 's husband la mancha the stain the gray stain la mancha de 1 : 21 from the stain

102 Hypergraph restructuring 0.6 the man the man 0.4 de : 's the husband Every node the husband remembers enough for 0.1edges la mancha to compute LM costs 0.7 the stain la mancha de 1 : 21 from the gray stain 0.6 the stain

103 Complexity What is the run-time of this algorithm?

104 Complexity What is the run-time of this algorithm? O( V E 2(n 1) ) Going to longer n-grams is exponentially expensive!

105 Cube pruning Expanding every node like this exhaustively is impractical Polynomial time, but really, really big! Cube pruning: minor tweak on the above algorithm Compute the k-best expansions at each node Use an estimate (usually a unigram probability) of the unscored left-edge to rank the nodes

106 Cube pruning Why cube pruning? Cube-pruning only involves a cube when arity-2 rules are used! More appropriately called square pruning with arity-1 Or hypercube pruning with arity > 2!

107 Cube Pruning (VP VP1, 6 PP1, 3 VP3, 6 monotonic grid? held meeting 3,6 ) (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 12

108 Cube Pruning VP1, 6 PP1, 3 VP3, 6 (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP non-monotonic grid due to LM combo costs with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 13

109 Cube Pruning VP1, 6 PP1, 3 VP3, 6 bigram (meeting, with) (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP non-monotonic grid due to LM combo costs with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 13

110 Cube Pruning (VP VP1, 6 PP1, 3 VP3, 6 non-monotonic grid due to LM combo costs held meeting 3,6 ) (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 14

111 Cube Pruning k-best parsing (Huang and Chiang, 2005) a priority queue of candidates extract the best candidate (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held meeting 3,6 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 15

112 k-best parsing (Huang and Chiang, 2005) a priority queue of candidates extract the best candidate push the two successors (VP held meeting 3,6 ) Cube Pruning (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 16

113 k-best parsing (Huang and Chiang, 2005) a priority queue of candidates extract the best candidate push the two successors (VP held meeting 3,6 ) Cube Pruning (PP with Sharon 1,3 ) (PP along Sharon 1,3 ) (PP with Shalong 1,3 ) (VP held talk 3,6 ) (VP hold conference 3,6 ) Huang and Chiang Forest Rescoring 17

114 Cube pruning Widely used for phrase-based and syntax-based MT May be applied in conjunction with a bottom-up decoder, or as a second rescoring pass Nodes may also be grouped together (for example, all nodes corresponding to a certain source span) Requirement for topological ordering means translation hypergraph may not have cycles

115 Alignment

116 Hypergraphs as Grammars A hypergraph is isomorphic to a (synchronous) CFG LM integration can be understood as the intersection of an LM and an CFG Cube pruning approximates this intersection Is the algorithm optimal?

117 Constrained decoding Wu (1997) gives an algorithm that is a generalization of CKY to SCFGs Requires an approximation of CNF Alternative solution: Parse in one language Then parse the other side of the string pair the with hypergraph grammar

118 Input: <dianzi shiang de mao, a cat on the mat> With thanks and apologies to Zhifei Li.

119 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 S 0, 0 0,4 the cat S 0, 0 0,4 a mat 0 de 1, 1 on 0 0 de 1, de 1, 0 s 1 0 de 1, 1 of 0 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3 With thanks and apologies to Zhifei Li.

120 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 Isomorphic CFG S 0, 0 0,4 the cat S 0, 0 0,4 a mat [34]! a cat 0 de 1, 1 on 0 0 de 1, de 1, 0 s 1 0 de 1, 1 of 0 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3

121 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 Isomorphic CFG S 0, 0 0,4 the cat S 0, 0 0,4 a mat [34]! a cat [02]! the mat 0 de 1, 1 on 0 0 de 1, de 1, 0 s 1 0 de 1, 1 of 0 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3

122 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 S S 0, 0 0,4 the cat 0, 0 0,4 a mat 0 de 1, 1 on 0 0 de 1, 0 s 1 0 de 1, 1 of 0 0 de 1, 0 1 Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3

123 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 S S 0, 0 0,4 the cat 0, 0 0,4 a mat 0 de 1, 1 on 0 0 de 1, 0 s 1 0 de 1, 1 of 0 0 de 1, 0 1 Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat dianzi0 shang1 de2 mao3

124 Input: <dianzi shiang de mao, a cat on the mat> S 0,4 S S 0, 0 0,4 the cat 0, 0 0,4 a mat 0 de 1, 1 on 0 0 de 1, 0 s 1 0 de 1, 1 of 0 0 de 1, 0 1 0,2 the mat 3,4 a cat dianzi shang, the mat mao, a cat Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] dianzi0 shang1 de2 mao3

125 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b]

126 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] a cat on the mat

127 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] [34] a cat on the mat

128 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] [34] [02] a cat on the mat

129 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] [34] [04a] [02] a cat on the mat

130 Input: <dianzi shiang de mao, a cat on the mat> Isomorphic CFG [34]! a cat [02]! the mat [04a]! [34] on [02] [04a]! [34] of [02] [04b]! [02]!s [34] [04b]! [02] [34] [S]! [04a] [S]! [04b] [34] [S] [04a] [02] a cat on the mat

131 seconds / sentence ITG algorithm Two-parse algorithm source length (words)

Aspects of Tree-Based Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...

More information

Aspects of Tree-Based Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation Aspects of Tree-Based tatistical Machine Translation Marcello Federico (based on slides by Gabriele Musillo) Human Language Technology FBK-irst 2011 Outline Tree-based translation models: ynchronous context

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

Variational Decoding for Statistical Machine Translation

Variational Decoding for Statistical Machine Translation Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1

More information

Syntax-based Statistical Machine Translation

Syntax-based Statistical Machine Translation Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I Part II - Introduction - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical

More information

Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions

Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Soft Inference and Posterior Marginals. September 19, 2013

Soft Inference and Posterior Marginals. September 19, 2013 Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

Synchronous Grammars

Synchronous Grammars ynchronous Grammars ynchronous grammars are a way of simultaneously generating pairs of recursively related strings ynchronous grammar w wʹ ynchronous grammars were originally invented for programming

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1 Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment. even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Structure and Complexity of Grammar-Based Machine Translation

Structure and Complexity of Grammar-Based Machine Translation Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Marrying Dynamic Programming with Recurrent Neural Networks

Marrying Dynamic Programming with Recurrent Neural Networks Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

Context- Free Parsing with CKY. October 16, 2014

Context- Free Parsing with CKY. October 16, 2014 Context- Free Parsing with CKY October 16, 2014 Lecture Plan Parsing as Logical DeducBon Defining the CFG recognibon problem BoHom up vs. top down Quick review of Chomsky normal form The CKY algorithm

More information

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

Exam: Synchronous Grammars

Exam: Synchronous Grammars Exam: ynchronous Grammars Duration: 3 hours Written documents are allowed. The numbers in front of questions are indicative of hardness or duration. ynchronous grammars consist of pairs of grammars whose

More information

AN ABSTRACT OF THE DISSERTATION OF

AN ABSTRACT OF THE DISSERTATION OF AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates

More information

Syntax-Based Decoding

Syntax-Based Decoding Syntax-Based Decoding Philipp Koehn 9 November 2017 1 syntax-based models Synchronous Context Free Grammar Rules 2 Nonterminal rules NP DET 1 2 JJ 3 DET 1 JJ 3 2 Terminal rules N maison house NP la maison

More information

Probabilistic Context Free Grammars

Probabilistic Context Free Grammars 1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set

More information

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single

More information

Efficient Incremental Decoding for Tree-to-String Translation

Efficient Incremental Decoding for Tree-to-String Translation Efficient Incremental Decoding for Tree-to-String Translation Liang Huang 1 1 Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292, USA

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Top-Down K-Best A Parsing

Top-Down K-Best A Parsing Top-Down K-Best A Parsing Adam Pauls and Dan Klein Computer Science Division University of California at Berkeley {adpauls,klein}@cs.berkeley.edu Chris Quirk Microsoft Research Redmond, WA, 98052 chrisq@microsoft.com

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21 Parsing Unger s Parser Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2016/17 1 / 21 Table of contents 1 Introduction 2 The Parser 3 An Example 4 Optimizations 5 Conclusion 2 / 21 Introduction

More information

Even More on Dynamic Programming

Even More on Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 Even More on Dynamic Programming Lecture 15 Thursday, October 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 26 Part I Longest Common Subsequence

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26 Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating

More information

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing

More information

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0 Massachusetts Institute of Technology 6.863J/9.611J, Natural Language Processing, Spring, 2001 Department of Electrical Engineering and Computer Science Department of Brain and Cognitive Sciences Handout

More information

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015 Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische

More information

Translation as Weighted Deduction

Translation as Weighted Deduction Translation as Weighted Deduction Adam Lopez University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB United Kingdom alopez@inf.ed.ac.uk Abstract We present a unified view of many translation algorithms

More information

So# Inference and Posterior Marginals. September 19, 2013

So# Inference and Posterior Marginals. September 19, 2013 So# Inference and Posterior Marginals September 19, 2013 So# vs. Hard Inference Hard inference Give me a single solucon Viterbi algorithm Maximum spanning tree (Chu- Liu- Edmonds alg.) So# inference Task

More information

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is Unger s Parser Laura Heinrich-Heine-Universität Düsseldorf Wintersemester 2012/2013 a top-down parser: we start with S and

More information

NLP Programming Tutorial 11 - The Structured Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is

More information

Products of Weighted Logic Programs

Products of Weighted Logic Programs Under consideration for publication in Theory and Practice of Logic Programming 1 Products of Weighted Logic Programs SHAY B. COHEN, ROBERT J. SIMMONS, and NOAH A. SMITH School of Computer Science Carnegie

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

A brief introduction to Logic. (slides from

A brief introduction to Logic. (slides from A brief introduction to Logic (slides from http://www.decision-procedures.org/) 1 A Brief Introduction to Logic - Outline Propositional Logic :Syntax Propositional Logic :Semantics Satisfiability and validity

More information

Natural Language Processing. Lecture 13: More on CFG Parsing

Natural Language Processing. Lecture 13: More on CFG Parsing Natural Language Processing Lecture 13: More on CFG Parsing Probabilistc/Weighted Parsing Example: ambiguous parse Probabilistc CFG Ambiguous parse w/probabilites 0.05 0.05 0.20 0.10 0.30 0.20 0.60 0.75

More information

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write: Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.

More information

Translation as Weighted Deduction

Translation as Weighted Deduction Translation as Weighted Deduction Adam Lopez University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB United Kingdom alopez@inf.ed.ac.uk Abstract We present a unified view of many translation algorithms

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Syntax Analysis Part I

Syntax Analysis Part I 1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2013 2 Position of a Parser in the Compiler Model Source Program Lexical Analyzer

More information

Syntax Analysis Part I

Syntax Analysis Part I 1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2013 2 Position of a Parser in the Compiler Model Source Program Lexical Analyzer

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP

More information

Lecture 7: Introduction to syntax-based MT

Lecture 7: Introduction to syntax-based MT Lecture 7: Introduction to syntax-based MT Andreas Maletti Statistical Machine Translation Stuttgart December 16, 2011 SMT VII A. Maletti 1 Lecture 7 Goals Overview Tree substitution grammars (tree automata)

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

Structured Prediction with Perceptron: Theory and Algorithms

Structured Prediction with Perceptron: Theory and Algorithms Structured Prediction with Perceptron: Theor and Algorithms the man bit the dog the man hit the dog DT NN VBD DT NN 那 人咬了狗 =+1 =-1 Kai Zhao Dept. Computer Science Graduate Center, the Cit Universit of

More information

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Introduction to Bottom-Up Parsing Outline Review LL parsing Shift-reduce parsing The LR parsing algorithm Constructing LR parsing tables 2 Top-Down Parsing: Review Top-down parsing expands a parse tree

More information

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities 1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside- 2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence

More information

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David

More information

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017 Computational Linguistics CSC 2501 / 485 Fall 2017 13A 13A. Log-Likelihood Dependency Parsing Gerald Penn Department of Computer Science, University of Toronto Based on slides by Yuji Matsumoto, Dragomir

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Introduction to Bottom-Up Parsing Outline Review LL parsing Shift-reduce parsing The LR parsing algorithm Constructing LR parsing tables Compiler Design 1 (2011) 2 Top-Down Parsing: Review Top-down parsing

More information

Context Free Grammars: Introduction

Context Free Grammars: Introduction Context Free Grammars: Introduction Context free grammar (CFG) G = (V, Σ, R, S), where V is a set of grammar symbols Σ V is a set of terminal symbols R is a set of rules, where R (V Σ) V S is a distinguished

More information

X-bar theory. X-bar :

X-bar theory. X-bar : is one of the greatest contributions of generative school in the filed of knowledge system. Besides linguistics, computer science is greatly indebted to Chomsky to have propounded the theory of x-bar.

More information

Chapter 4: Context-Free Grammars

Chapter 4: Context-Free Grammars Chapter 4: Context-Free Grammars 4.1 Basics of Context-Free Grammars Definition A context-free grammars, or CFG, G is specified by a quadruple (N, Σ, P, S), where N is the nonterminal or variable alphabet;

More information

Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings

Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings Kevin Gimpel and Noah A. Smith Language Technologies Institute Carnegie Mellon University Pittsburgh,

More information

Computational Models - Lecture 4 1

Computational Models - Lecture 4 1 Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Parsing. Weighted Deductive Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Weighted Deductive Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26 Parsing Weighted Deductive Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Idea 2 Algorithm 3 CYK Example 4 Parsing 5 Left Corner Example 2 / 26

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Outline Introduction to Bottom-Up Parsing Review LL parsing Shift-reduce parsing he LR parsing algorithm Constructing LR parsing tables 2 op-down Parsing: Review op-down parsing expands a parse tree from

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing! CS 6120 Spring 2014! Northeastern University!! David Smith! with some slides from Jason Eisner & Andrew

More information

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a) Chapter 13 Statistical Parsg Given a corpus of trees, it is easy to extract a CFG and estimate its parameters. Every tree can be thought of as a CFG derivation, and we just perform relative frequency estimation

More information

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation MA/CSSE 474 Theory of Computation Bottom-up parsing Pumping Theorem for CFLs Recap: Going One Way Lemma: Each context-free language is accepted by some PDA. Proof (by construction): The idea: Let the stack

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Bringing machine learning & compositional semantics together: central concepts

Bringing machine learning & compositional semantics together: central concepts Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding

More information