Parsing beyond context-free grammar: Parsing Multiple Context-Free Grammars

Parsing beyond context-free grammar: Parsing Multiple Context-Free Grammars Laura Kallmeyer, Wolfgang Maier University of Tübingen ESSLLI Course 2008 Parsing beyond CFG 1 MCFG Parsing Multiple Context-Free Grammars (1) Seki et al. (1991), Seki & Kato (2008) Motivation: describe discontinuity. Idea: Non-terminal symbols can span a tuple of strings that need not be adjacent in the input string. Rewrite rules of the form A 0 f[a 1,..., A q ] where f is a function describing how to compute an A 0 -tuple from tuples satisfying A 1,..., A q such that each component of the value of f is a concatenation of some constant strings and some components of its arguments; each component of the rhs arguments is not allowed to appear in the value of f more than once. Parsing beyond CFG 3 MCFG Parsing Overview 1. Multiple Context-Free Grammars 2. CYK Parsing (a) The basic algorithm (b) The naïve algorithm (c) The active algorithm (d) The incremental algorithm (e) Prediction strategies 3. Conclusion Multiple Context-Free Grammars (2) A MCFG is a 5-tuple N, T, F, P, S where N is a finite set of non-terminals, each A N has a dimension dim(a) 1, dim(a) IN; T is a finite set of terminals; F is a finite set of mcf-functions (see below); P is a finite set of rules of the form A 0 f[a 1,..., A k ] with k 0, f F such that f : (T ) dim(a1)... (T ) dim(ak) (T ) dim(a0) ; S N is the start symbol. dim(s) = 1. Parsing beyond CFG 2 MCFG Parsing Parsing beyond CFG 4 MCFG Parsing

Multiple Context-Free Grammars (3) f is an mcf-function if there is a k 0 and there are d i > 0 for 0 i k such that f is a total function from (T ) d1... (T ) dk to (T ) d0 such that the components of f( x 1,..., x k ) are concatenations of a limited amount of terminal symbols and the components x ij of the x i (1 i k, 1 j d i ), and the components x ij of the x i are used at most once in the components of f( x 1,..., x k ). Multiple Context-Free Grammars (5) Two ranges l 1, r 1, l 2, r 2 are overlapping if either a) l 1 l 2 < r 1 and l 1 < r 2 or b) l 1 < r 2 r 1 and l 2 < r 1. A ρ (Pos(w) Pos(w)) k is a k-dimensional range vector for w iff ρ = l 1, r 1,..., l k, r k with a) l i, r i is a range in w for 1 i k and b) the elements of ρ are pairwise non-overlapping. We then define ρ(w) := l 1, r 1 (w),..., l k, r k (w). Parsing beyond CFG 5 MCFG Parsing Parsing beyond CFG 7 MCFG Parsing Multiple Context-Free Grammars (4) Given an input string w, each A N can be considered as a predicate that is true for certain vectors of substrings of w. To distinguish between different substrings containing the same terminal symbols, we introduce ranges: Let w be the input word, w = w 1...w n. Pos(w) := {0,..., n}. We call a pair l, r Pos(w) Pos(w) with l r a range in w. Its yield l, r (w) is the substring w l+1...w r. For two ranges ρ 1 = l 1, r 1, ρ 2 = l 2, r 2 : if r 1 = l 2, then ρ 1 ρ 2 = l 1, r 2 ; otherwise ρ 1 ρ 2 is undefined. Parsing beyond CFG 6 MCFG Parsing Multiple Context-Free Grammars (6) Now we define the range vectors in the yield of a given predicate A wrt w: For every terminating rule A f[ ] and every range l, r : if l, r (w) = f[ ], then A( l, r ). Let A f[a 1,..., A k ] be a production and ρ i range vectors with A i ( ρ i ) for 1 i k. We now apply f directly to the range vectors while mapping the terminals in the lhs to appropriate ranges of length 1. This way, f is no longer a function and it is no longer defined for all range vectors. (In some cases, it might yield undefined concatenations of ranges.) For all ρ f( ρ 1,..., ρ k ): A( ρ). For any other ρ, the predicate A is false. The language of a MCFG G is {w T S( 0, n ) wrt. w)}. Parsing beyond CFG 8 MCFG Parsing

CYK Parsing: the basic algorithm (1) Seki et al. (1991). Idea: process input from left to right, calculate for each position i all predicates A together with their yield position vector whose rightmost yield component ends at some position j i while starting with the terminating rules. w is in the language iff S with position vector 0, n is in the final set. CYK Parsing: the naïve algorithm (1) Problem with basic CYK algorithm: in order to perform a complete, one has to find items for all arguments of a rhs at the same time. Burden & Ljunglöf (2005) propose to modify the basic CYK algorithm such that only one daughter needs to be found at a time. binarization with dotted items for partially completed rhs (similar to Chomsky Normal Form for CFGs). Such items must contain all range vectors for the already recognized predicates of the rhs. Parsing beyond CFG 9 MCFG Parsing Parsing beyond CFG 11 MCFG Parsing CYK Parsing: the basic algorithm (2) Deduction rules: Items [A, ρ] with A N, ρ is a dim(a)-dimensional range vector in w. CYK Parsing: the naïve algorithm (2) We know that the arguments of the rhs predicates are taken as single components of the arguments of the lhs. We refer to them as A (k) i. Then we can write a rewriting rule as follows: Axioms: Complete: [A, ρ] A f[ ], f[ ] = ρ(w) [A 1, ρ 1 ],..., [A m, ρ m ] [A, ρ] A f[a 1,..., A m ], ρ f[ ρ 1,..., ρ m ] A 0 f[a 1,..., A n ] := x 1,..., x k where k = dim(a 0 ), x i (T {A (m) A {A 1,..., A n }, m {1,..., dim(a)}}). The vector x = x 1,..., x k is called a range constraint vector. Goal item: [S, 0, n ] Parsing beyond CFG 10 MCFG Parsing Parsing beyond CFG 12 MCFG Parsing

CYK Parsing: the naïve algorithm (3) Given a w, we can map the terminal symbols in a range constraint vector to ranges in w: Let x be a range constraint vector, x a component of x. We define if x T, then x w = { l, r l, r (w) = x} x = yv z with V = A (m), then x w = {α 1 A (m) α 2 α 1 y w, α 2 z w }. x w is then obtained by applying this to all components of x such that the ranges occurring in the result are all pairwise non-overlapping. CYK Parsing: the naïve algorithm (5) Convert turns a completely recognized active item into a passive item: [A f[ B ]; φ] [A; φ] Complete moves the dot over a non-terminal if a corresponding passive item exists. [A f[ B B k B ]; φ], [B k ; ψ] [A f[ BB k B ]; φ ] φ = φ[b k / ψ] Here, φ[b k / ψ] means replacing every occurrence of B (i) k in φ with ψ(i). Parsing beyond CFG 13 MCFG Parsing Parsing beyond CFG 15 MCFG Parsing CYK Parsing: the naïve algorithm (4) Naive algorithm: Passive items [A, ρ] and active items [A 0 f[ A A ]; φ] where the components of φ are concatenations of ranges and variables A (i). Predict introduces new axioms: [A f[ B]; φ] A f[ B] := x and φ x w Note that this is a completely blind prediction, any rule is predicted as being potentially used. CYK Parsing: the active algorithm (1) Idea: use the dot to traverse the range constraint vector φ. Passive items [A; Γ] as before. Active items [A f[ B]; (φ, ρ x, ψ); Γ] Such an active item indicates that the first arguments of A have been recognized yielding the ranges φ and the next argument is recognized up to the position marked by the dot so far yielding ρ. The rest of this argument (range constraints x) and the following arguments (range constraints ψ) are still waiting for completion. Γ contains range vectors for the predicates in B if these are found; otherwise it contains the variables B (i) k for these ranges. Parsing beyond CFG 14 MCFG Parsing Parsing beyond CFG 16 MCFG Parsing

CYK Parsing: the active algorithm (2) Predict introduces a new rule with the dot on the left of its range constraint vector: [A f[ B]; ( x, Ψ); Γ B ] ( Γ B contains the range variables for the vector B) A f[ B] := (x, Ψ) Complete moves a dot that is at the end of an argument to the next argument: [A f[ B]; (Φ, α, x, Ψ); Γ] CYK Parsing: the incremental algorithm (1) Problem of active algorithm: only passive items are used in combine steps. I.e., in a situation where the dot precedes A (i), in order to use the ith component of the predicate A, all the other components of A must already have been recognized. Better: process incrementally, allow to use active items in combine steps. Idea: read one token at the time and calculate all possible consequences of that token before the next token is read. [A f[ B]; (Φ, α, x, Ψ); Γ] Scan moves the dot over a terminal: [A f[ B]; (Φ, α ax, Ψ); Γ [A f[ B]; Φ, α l, r x, Ψ); Γ] l, r (w) = a Parsing beyond CFG 17 MCFG Parsing Parsing beyond CFG 19 MCFG Parsing CYK Parsing: the active algorithm (3) Combine moves the dot over a non-terminal if the corresponding passive item has been found: [A f[ B]; (Φ, α B (i) k x, Ψ), Γ], [B k ; ρ] [A f[ B]; (Φ, α ρ(i) x, Ψ); Γ ] Γ(k) compatible with ρ, Γ = Γ(k, i := ρ(i)) ( compatible means for every 1 i dim(b k ): either Γ(k)(i) = ρ(i) or Γ(k)(i) = B (i) k ) CYK Parsing: the incremental algorithm (2) We now use explicit feature r 1,..., r k for the range constraints of the k ranges of a predicate A. This way, the argument index is no longer given by the position in the range constraint vector and we can process the arguments in any order. Only active items [A f[ B]; (φ, r i = ρ x, ψ); Γ] As in the active algorithm except that the order in the range constraint vectors need not be the same as in the original rule in the grammar. Convert turns a fully recognized active item into a passive item: [A f[ B]; (Φ, α ), Γ] [A; (Φ, α)] Parsing beyond CFG 18 MCFG Parsing Parsing beyond CFG 20 MCFG Parsing

CYK Parsing: the incremental algorithm (3) Predict introduces a new rule with the dot on the left of one of the arguments of the lhs: [A f[ B]; (r i = k, k x, Ψ 1, Ψ 2 ); Γ B ] ( Γ B contains the range variables for the vector B) A f[ B ] := (Ψ 1, x, Ψ 2) with x the ith element, 1 i dim(a), 1 k n Complete moves a dot that is at the end of an argument to another argument: [A f[ B]; (Φ, r i = l i, r i, Ψ 1, r j = x, Ψ 2 ); Γ] [A f[ B]; (Φ, r i = α, r j = k, k x, Ψ 1, Ψ 2 ); Γ] r i k n CYK Parsing: Prediction strategies Problem of predict operations presented above: We compute partial results that are not reachable given the predicates we are looking for/the predicates we have already found. Solution: replace the unrestricted predict rule with more intelligent predictions. Possible strategies: A f[ B] with dot left of r i = α is only predicted if there is another item looking for A (i) (top-down prediction). there is a passive item that has found the first symbol in α (bottom-up prediction). Parsing beyond CFG 21 MCFG Parsing Parsing beyond CFG 23 MCFG Parsing CYK Parsing: the incremental algorithm (4) Scan moves the dot over a terminal: [A f[ B]; (Φ, r i = l, r ax, Ψ); Γ [A f[ B]; (Φ, r i = l, r + 1 x, Ψ); Γ] r, r + 1 (w) = a Combine moves the dot over a non-terminal if the corresponding passive item has been found: [A f[ B]; (Φ 1, r j = α B (i) k x, Ψ 1), Γ] [B k ; (Φ 2, r i = β, Ψ 2 ] [A f[ B]; (Φ 1, r j = α β x, Ψ 1 ); Γ(k, i := β)] ( compatible means for every 1 h dim(b k ): if r h = α h (Φ 2 ), then Γ(k)(h) = α h ) Γ(k) compatible with (Φ 2) Conclusion Starting point: basic algorithm (Seki et al.). Refinement: decompose single items and deductions steps in different items and smaller deduction steps. naïve algorithm (Burden & Ljunglöf, 2005). Further refinement: devide the combine rule into complete, scan and combine. active algorithm (Burden & Ljunglöf, 2005). Further refinement: predict and complete can select from any possible remaining range constraint, not just the following. incremental algorithm (Burden & Ljunglöf, 2005). The algorithms from Burden & Ljunglöf (2005) have been implemented in the Grammatical Framework System (Ranta, 2004). Parsing beyond CFG 22 MCFG Parsing Parsing beyond CFG 24 MCFG Parsing