A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

Size: px

Start display at page:

Download "A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions"

Dale Hicks
5 years ago
Views:

1 A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26

2 The Task (Logical Form) λx 0.state(x 0 ) x 1.[loc(miss r, x 1 ) state(x 1 ) next to(x 1, x 0 )] give me the states bordering states that the mississipppi runs through (Natural Language Sentence) 2/26

3 The Task (Logical Form) arg max(x, river(x) y.[state(y) next to(y, indiana s) loc(x, y)], len(x))??? (Natural Language Sentence) 2/26

4 Challenges How to transform from complex logical forms with rich internal structures into text? Major Contribution (1): A Novel Forest-to-String Algorithm A novel packed forest representation of formal semantics (λ-expressions), and a novel reduction-based weighted binary SCFG for language generation; Inspired by the hierarchical phrase-based translation model (Chiang 2005, 2007). 3/26

5 Challenges How to automatically acquire the lexicon that maps from logical terms to natural language words? Major Contribution (2): A Novel Grammar Induction Algorithm Acquiring such synchronous grammar rules by learning the correspondence between logical sub-expressions and (possibly discontiguous) natural language word sequences; Inspired by the hybrid tree model (Lu et al. 2008). 4/26

6 Previous Work From logical/semantic forms, but not probabilistic Wang (1980) On computational sentence generation from logical form Shieber et al. (1990) Semantic-head-driven generation 5/26

7 Previous Work From logical/semantic forms, but not probabilistic Wang (1980) On computational sentence generation from logical form Shieber et al. (1990) Semantic-head-driven generation Probabilistic, but from specialized representations Variable-free tree-structured representations Wong and Mooney (2007) Generation by inverting a semantic parser that uses statistical machine translation Lu et al. (2009) Natural language generation with tree conditional random fields Database entries Angeli et al. (2010) A simple domain-independent probabilistic approach to generation 5/26

8 Previous Work From logical/semantic forms, but not probabilistic Wang (1980) On computational sentence generation from logical form Shieber et al. (1990) Semantic-head-driven generation Probabilistic, but from specialized representations Variable-free tree-structured representations Wong and Mooney (2007) Generation by inverting a semantic parser that uses statistical machine translation Lu et al. (2009) Natural language generation with tree conditional random fields Database entries Angeli et al. (2010) A simple domain-independent probabilistic approach to generation From formal logical forms, and probabilistic This work 5/26

9 Notes about λ-calculus Alternative notations for functional application f g (f g) f g h ((f g) h) Types Basic types: e: entity; t: truth value. Composite types: e,t : takes in type e and returns type t. Conversions α-conversion: λy.state(y) λx.state(x) β-reduction: λy.λx.loc(y, x) miss r λx.loc(miss r, x) (Restricted) higher-order unification (Kwiatkowski et al. 2010). λx.loc(miss r, x) state(x) λg.λf.λx.g(x) f(x) λx.loc(miss r, x) λx.state(x) 6/26

10 Packed Meaning Forest λx.loc(miss r, x) state(x) λg.λf.λx.g(x) f(x) λg.λf.λx.f(x) g(x) λx.loc(miss r, x) λx.state(x) λx.state(x) λx.loc(miss r, x) (1) (2) the mississippi runs through which states (1) which states have the mississippi river (2) 7/26

11 Packed Meaning Forest λg.λf.λx.g(x) f(x) λx.loc(miss r, x) λx.state(x) λg.λf.λx.g(x) f(x) λy.λx.loc(y, x) λx.state(x) λg.λf.λx.g(x) f(x) miss r λx.loc(miss r, x) λx.state(x) 8/26

12 Packed Meaning Forest λg.λf.λx.g(x) f(x) λx.loc(miss r, x) λx.state(x) λg.λf.λx.g(x) f(x) λy.λx.loc(y, x) λx.state(x) λg.λf.λx.g(x) f(x) miss r λx.loc(miss r, x) λx.state(x) 8/26

13 Packed Meaning Forest r : e,t λx.loc(miss r, x) state(x) states that the mississippi river runs through e,t : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 λx.loc(miss r, x) state(x) states that the mississippi river runs through e,t : λg.λf.λx.f(x) g(x) e,t 1 e,t 2 λx.loc(miss r, x) state(x) states that the mississippi river runs through e,t : λx.state(x) λx.state(x) states e,t : λy.λx.loc(y, x) e 1 λx.loc(miss r, x) that the mississippi river runs through e : miss r miss r the mississippi river e,t : λx.loc(miss r, x) λx.loc(miss r, x) that the mississippi river runs through 9/26

14 Reduction-based Synchronous CFG Grammar used for language generation. e,t e λy.λx.loc(y, x) e 1, that e 1 runs through miss r, the mississippi (3) (4) A derivation with (3)+(4) e,t : λx.loc(miss r, x), that the mississippi runs through where λx.loc(miss r, x) λy.λx.loc(y, x) miss r How do we automatically induce such a grammar from data? 10/26

15 Reduction-based Synchronous CFG Grammar used for language generation. e,t e λy.λx.loc(y, x) e 1, that e 1 runs through miss r, the mississippi (3) (4) A derivation with (3)+(4) e,t : λx.loc(miss r, x), that the mississippi runs through where λx.loc(miss r, x) λy.λx.loc(y, x) miss r How do we automatically induce such a grammar from data? 10/26

16 Reduction-based Synchronous CFG Grammar used for language generation. e,t e λy.λx.loc(y, x) e 1, that e 1 runs through miss r, the mississippi (3) (4) A derivation with (3)+(4) e,t : λx.loc(miss r, x), that the mississippi runs through where λx.loc(miss r, x) λy.λx.loc(y, x) miss r How do we automatically induce such a grammar from data? 10/26

17 Grammar Induction Problem How to find mappings between λ-sub-expressions and NL words in an unsupervised manner? 11/26

18 Grammar Induction Problem How to find mappings between λ-sub-expressions and NL words in an unsupervised manner? Challenges Logical forms (e.g., λ-expressions) have complex internal structures and variable dependencies; Text-to-text aligners (Giza++, Berkeley aligner) are not applicable. 11/26

19 Grammar Induction Problem How to find mappings between λ-sub-expressions and NL words in an unsupervised manner? Challenges Solution Logical forms (e.g., λ-expressions) have complex internal structures and variable dependencies; Text-to-text aligners (Giza++, Berkeley aligner) are not applicable. λ-hybrid tree : a new generative model that explicitly models the correspondence between λ-sub-expressions and NL word sequences. 11/26

20 λ-hybrid Tree A tree whose leaves are natural language words, and internal nodes are λ-productions Generated from an underlying joint generative process Extensions to Lu et al. (2008): Internal nodes involve λ-expressions Meaning representation has packed-forest representation e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through the mississippi 12/26

21 λ-hybrid Tree e,t 1 : λy.λx.loc(y, x) e 1 that e 1 : miss r runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26

22 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26

23 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26

24 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26

25 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26

26 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26

27 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26

28 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26

29 λ-hybrid Tree e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through (English) the mississippi e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 1 : λy.λx.loc(y, x) e 1 e,t 2 : λx.state(x) e 1 : miss r 穿越的州 (Chinese) 密西西比河 λ-expression: λx.loc(miss r, x) state(x) 14/26

30 λ-hybrid Tree which hybrid tree is the correct one? e,t : λy.λx.loc(y, x) e 1 e,t : λy.λx.loc(y, x) e 1 that the e 1 : miss r runs through that e 1 : miss r through mississippi the mississippi runs e,t : λy.λx.loc(y, x) e 1 e,t : λy.λx.loc(y, x) e 1 e 1 : miss r runs through that the mississippi e 1 : miss r that the mississippi runs through Hybrid trees are hidden structures which need to be estimated with the Inside-Outside algorithm; We have developed an efficient algorithm that runs in cubic time in the number of words of NL sentence. 15/26

31 Grammar Induction Overall Algorithm For each training instance, construct its packed meaning forest. Train the λ-hybrid tree generative model with the training set, and find the most probable λ-hybrid tree for each training instance, and then extract the grammar rules from it. 16/26

32 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through the mississippi One-level rules e,t λy.λx.loc(y, x) e 1, that e 1 runs through 17/26

33 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through the mississippi Subtree rules e,t λx.loc(miss r, x), that the mississippi runs through 18/26

34 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through the mississippi Two-level rules λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26

35 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : y runs through Two-level rules [ ] λy. λg.λf.λx.g(x) f(x) [λy.λx.loc(y, x) y ] λx.state(x) e 1 λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26

36 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : y runs through Two-level rules [ ] λy. λg.λf.λx.g(x) f(x) [λy.λx.loc(y, x) y ] λx.state(x) e 1 λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26

37 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : y runs through Two-level rules [ ] λy. λg.λf.λx.g(x) f(x) [λy.λx.loc(y, x) y ] λx.state(x) e 1 λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26

38 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : y runs through Two-level rules [ ] λy. λg.λf.λx.g(x) f(x) [λy.λx.loc(y, x) y ] λx.state(x) e 1 λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26

39 Log-linear Model We assign score to derivation D: ( ) w(d) = f i (r) w i p LM (ŝ) w LM r D i 4 simple and general features: 3 rule-specific features 1 LM feature Minimum Error Rate Training (Och 2003) for learning feature weights 20/26

40 Decoding Forest-to-String Decoding For a given source expression e, find the most probable derivation D as scored by w, that produces e; the target side gives the generated sentence ŝ. ( ) ŝ = s arg max w(d) D s.t. e(d) e A bottom-up dynamic programming algorithm with cube-pruning. 21/26

41 Automatic Evaluation The Geoquery dataset (880), annotated with complete sentences in both English and Chinese. English Chinese Bleu 1 Ter Bleu 1 Ter text Moses preorder inorder postorder text Joshua preorder inorder postorder This work p < 0.01 for all cases, except for comparing against Joshua-preorder (p < 0.05) 22/26

42 Importance of Different Rules Subtree rules and two-level rules are capable of modeling some longer range dependencies. English Chinese Bleu 1 Ter Bleu 1 Ter with all rules w/o subtree rules w/o two-level rules /26

43 Human Evaluation Randomly sampled 50% of the testing examples. Five Judges each for both languages. English Flu Sem Moses 4.48 ± ± 0.20 Joshua 4.40 ± ± 0.18 This work 4.66 ± ± 0.16 Chinese Flu Sem Moses 4.14 ± ± 0.17 Joshua 4.00 ± ± 0.21 This work 4.59 ± ± 0.10 p < 0.01 for all cases 24/26

44 Variable-free Datasets The model can be applied to such variable-free datasets with tree-structured representations. For example, midfield(opp) λx.midfield(x) opp. Robocup(300) Geoquery(880) Bleu Nist Bleu Nist Wong and Mooney (2007) Lu et al. (2009) This work /26

45 Conclusions Introduced a novel reduction-based binary SCFG with a forest-to-string algorithm for language generation from typed lambda calculus expressions represented as a packed meaning forest. Introduced a novel grammar induction algorithm, built on top of the λ-hybrid tree model that models the joint generative process of both λ-expressions and natural language texts. 26/26

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,