A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
|
|
- Dale Hicks
- 5 years ago
- Views:
Transcription
1 A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26
2 The Task (Logical Form) λx 0.state(x 0 ) x 1.[loc(miss r, x 1 ) state(x 1 ) next to(x 1, x 0 )] give me the states bordering states that the mississipppi runs through (Natural Language Sentence) 2/26
3 The Task (Logical Form) arg max(x, river(x) y.[state(y) next to(y, indiana s) loc(x, y)], len(x))??? (Natural Language Sentence) 2/26
4 Challenges How to transform from complex logical forms with rich internal structures into text? Major Contribution (1): A Novel Forest-to-String Algorithm A novel packed forest representation of formal semantics (λ-expressions), and a novel reduction-based weighted binary SCFG for language generation; Inspired by the hierarchical phrase-based translation model (Chiang 2005, 2007). 3/26
5 Challenges How to automatically acquire the lexicon that maps from logical terms to natural language words? Major Contribution (2): A Novel Grammar Induction Algorithm Acquiring such synchronous grammar rules by learning the correspondence between logical sub-expressions and (possibly discontiguous) natural language word sequences; Inspired by the hybrid tree model (Lu et al. 2008). 4/26
6 Previous Work From logical/semantic forms, but not probabilistic Wang (1980) On computational sentence generation from logical form Shieber et al. (1990) Semantic-head-driven generation 5/26
7 Previous Work From logical/semantic forms, but not probabilistic Wang (1980) On computational sentence generation from logical form Shieber et al. (1990) Semantic-head-driven generation Probabilistic, but from specialized representations Variable-free tree-structured representations Wong and Mooney (2007) Generation by inverting a semantic parser that uses statistical machine translation Lu et al. (2009) Natural language generation with tree conditional random fields Database entries Angeli et al. (2010) A simple domain-independent probabilistic approach to generation 5/26
8 Previous Work From logical/semantic forms, but not probabilistic Wang (1980) On computational sentence generation from logical form Shieber et al. (1990) Semantic-head-driven generation Probabilistic, but from specialized representations Variable-free tree-structured representations Wong and Mooney (2007) Generation by inverting a semantic parser that uses statistical machine translation Lu et al. (2009) Natural language generation with tree conditional random fields Database entries Angeli et al. (2010) A simple domain-independent probabilistic approach to generation From formal logical forms, and probabilistic This work 5/26
9 Notes about λ-calculus Alternative notations for functional application f g (f g) f g h ((f g) h) Types Basic types: e: entity; t: truth value. Composite types: e,t : takes in type e and returns type t. Conversions α-conversion: λy.state(y) λx.state(x) β-reduction: λy.λx.loc(y, x) miss r λx.loc(miss r, x) (Restricted) higher-order unification (Kwiatkowski et al. 2010). λx.loc(miss r, x) state(x) λg.λf.λx.g(x) f(x) λx.loc(miss r, x) λx.state(x) 6/26
10 Packed Meaning Forest λx.loc(miss r, x) state(x) λg.λf.λx.g(x) f(x) λg.λf.λx.f(x) g(x) λx.loc(miss r, x) λx.state(x) λx.state(x) λx.loc(miss r, x) (1) (2) the mississippi runs through which states (1) which states have the mississippi river (2) 7/26
11 Packed Meaning Forest λg.λf.λx.g(x) f(x) λx.loc(miss r, x) λx.state(x) λg.λf.λx.g(x) f(x) λy.λx.loc(y, x) λx.state(x) λg.λf.λx.g(x) f(x) miss r λx.loc(miss r, x) λx.state(x) 8/26
12 Packed Meaning Forest λg.λf.λx.g(x) f(x) λx.loc(miss r, x) λx.state(x) λg.λf.λx.g(x) f(x) λy.λx.loc(y, x) λx.state(x) λg.λf.λx.g(x) f(x) miss r λx.loc(miss r, x) λx.state(x) 8/26
13 Packed Meaning Forest r : e,t λx.loc(miss r, x) state(x) states that the mississippi river runs through e,t : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 λx.loc(miss r, x) state(x) states that the mississippi river runs through e,t : λg.λf.λx.f(x) g(x) e,t 1 e,t 2 λx.loc(miss r, x) state(x) states that the mississippi river runs through e,t : λx.state(x) λx.state(x) states e,t : λy.λx.loc(y, x) e 1 λx.loc(miss r, x) that the mississippi river runs through e : miss r miss r the mississippi river e,t : λx.loc(miss r, x) λx.loc(miss r, x) that the mississippi river runs through 9/26
14 Reduction-based Synchronous CFG Grammar used for language generation. e,t e λy.λx.loc(y, x) e 1, that e 1 runs through miss r, the mississippi (3) (4) A derivation with (3)+(4) e,t : λx.loc(miss r, x), that the mississippi runs through where λx.loc(miss r, x) λy.λx.loc(y, x) miss r How do we automatically induce such a grammar from data? 10/26
15 Reduction-based Synchronous CFG Grammar used for language generation. e,t e λy.λx.loc(y, x) e 1, that e 1 runs through miss r, the mississippi (3) (4) A derivation with (3)+(4) e,t : λx.loc(miss r, x), that the mississippi runs through where λx.loc(miss r, x) λy.λx.loc(y, x) miss r How do we automatically induce such a grammar from data? 10/26
16 Reduction-based Synchronous CFG Grammar used for language generation. e,t e λy.λx.loc(y, x) e 1, that e 1 runs through miss r, the mississippi (3) (4) A derivation with (3)+(4) e,t : λx.loc(miss r, x), that the mississippi runs through where λx.loc(miss r, x) λy.λx.loc(y, x) miss r How do we automatically induce such a grammar from data? 10/26
17 Grammar Induction Problem How to find mappings between λ-sub-expressions and NL words in an unsupervised manner? 11/26
18 Grammar Induction Problem How to find mappings between λ-sub-expressions and NL words in an unsupervised manner? Challenges Logical forms (e.g., λ-expressions) have complex internal structures and variable dependencies; Text-to-text aligners (Giza++, Berkeley aligner) are not applicable. 11/26
19 Grammar Induction Problem How to find mappings between λ-sub-expressions and NL words in an unsupervised manner? Challenges Solution Logical forms (e.g., λ-expressions) have complex internal structures and variable dependencies; Text-to-text aligners (Giza++, Berkeley aligner) are not applicable. λ-hybrid tree : a new generative model that explicitly models the correspondence between λ-sub-expressions and NL word sequences. 11/26
20 λ-hybrid Tree A tree whose leaves are natural language words, and internal nodes are λ-productions Generated from an underlying joint generative process Extensions to Lu et al. (2008): Internal nodes involve λ-expressions Meaning representation has packed-forest representation e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through the mississippi 12/26
21 λ-hybrid Tree e,t 1 : λy.λx.loc(y, x) e 1 that e 1 : miss r runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26
22 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26
23 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26
24 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26
25 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26
26 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26
27 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26
28 λ-hybrid Tree p 1 that p 2 runs through the mississippi p 1 e,t : λy.λx.loc(y, x) e p 2 e : miss r T the (partial) hybrid tree above ( ) P T = ϕ(m wyw p 1 ) ψ(that e 1 runs through p 1 ) ρ(p 2 p 1, arg 1 ) ϕ(m w p 2 ) ψ(the mississippi p 2 ) Pattern Parameters Emission Parameters MR Model Parameters 13/26
29 λ-hybrid Tree e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through (English) the mississippi e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 1 : λy.λx.loc(y, x) e 1 e,t 2 : λx.state(x) e 1 : miss r 穿越的 州 (Chinese) 密西西比河 λ-expression: λx.loc(miss r, x) state(x) 14/26
30 λ-hybrid Tree which hybrid tree is the correct one? e,t : λy.λx.loc(y, x) e 1 e,t : λy.λx.loc(y, x) e 1 that the e 1 : miss r runs through that e 1 : miss r through mississippi the mississippi runs e,t : λy.λx.loc(y, x) e 1 e,t : λy.λx.loc(y, x) e 1 e 1 : miss r runs through that the mississippi e 1 : miss r that the mississippi runs through Hybrid trees are hidden structures which need to be estimated with the Inside-Outside algorithm; We have developed an efficient algorithm that runs in cubic time in the number of words of NL sentence. 15/26
31 Grammar Induction Overall Algorithm For each training instance, construct its packed meaning forest. Train the λ-hybrid tree generative model with the training set, and find the most probable λ-hybrid tree for each training instance, and then extract the grammar rules from it. 16/26
32 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through the mississippi One-level rules e,t λy.λx.loc(y, x) e 1, that e 1 runs through 17/26
33 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through the mississippi Subtree rules e,t λx.loc(miss r, x), that the mississippi runs through 18/26
34 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : miss r runs through the mississippi Two-level rules λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26
35 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : y runs through Two-level rules [ ] λy. λg.λf.λx.g(x) f(x) [λy.λx.loc(y, x) y ] λx.state(x) e 1 λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26
36 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : y runs through Two-level rules [ ] λy. λg.λf.λx.g(x) f(x) [λy.λx.loc(y, x) y ] λx.state(x) e 1 λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26
37 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : y runs through Two-level rules [ ] λy. λg.λf.λx.g(x) f(x) [λy.λx.loc(y, x) y ] λx.state(x) e 1 λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26
38 Rule Extraction e,t 2 : λg.λf.λx.g(x) f(x) e,t 1 e,t 2 e,t 2 : λx.state(x) e,t 1 : λy.λx.loc(y, x) e 1 states that e 1 : y runs through Two-level rules [ ] λy. λg.λf.λx.g(x) f(x) [λy.λx.loc(y, x) y ] λx.state(x) e 1 λy.λx.loc(y, x) state(x) e 1 λy.λx.loc(y, x) state(x) e 1 e,t λy.λx.loc(y, x) state(x) e 1, states that e 1 runs through Substitution β-reductions α-conversion 19/26
39 Log-linear Model We assign score to derivation D: ( ) w(d) = f i (r) w i p LM (ŝ) w LM r D i 4 simple and general features: 3 rule-specific features 1 LM feature Minimum Error Rate Training (Och 2003) for learning feature weights 20/26
40 Decoding Forest-to-String Decoding For a given source expression e, find the most probable derivation D as scored by w, that produces e; the target side gives the generated sentence ŝ. ( ) ŝ = s arg max w(d) D s.t. e(d) e A bottom-up dynamic programming algorithm with cube-pruning. 21/26
41 Automatic Evaluation The Geoquery dataset (880), annotated with complete sentences in both English and Chinese. English Chinese Bleu 1 Ter Bleu 1 Ter text Moses preorder inorder postorder text Joshua preorder inorder postorder This work p < 0.01 for all cases, except for comparing against Joshua-preorder (p < 0.05) 22/26
42 Importance of Different Rules Subtree rules and two-level rules are capable of modeling some longer range dependencies. English Chinese Bleu 1 Ter Bleu 1 Ter with all rules w/o subtree rules w/o two-level rules /26
43 Human Evaluation Randomly sampled 50% of the testing examples. Five Judges each for both languages. English Flu Sem Moses 4.48 ± ± 0.20 Joshua 4.40 ± ± 0.18 This work 4.66 ± ± 0.16 Chinese Flu Sem Moses 4.14 ± ± 0.17 Joshua 4.00 ± ± 0.21 This work 4.59 ± ± 0.10 p < 0.01 for all cases 24/26
44 Variable-free Datasets The model can be applied to such variable-free datasets with tree-structured representations. For example, midfield(opp) λx.midfield(x) opp. Robocup(300) Geoquery(880) Bleu Nist Bleu Nist Wong and Mooney (2007) Lu et al. (2009) This work /26
45 Conclusions Introduced a novel reduction-based binary SCFG with a forest-to-string algorithm for language generation from typed lambda calculus expressions represented as a packed meaning forest. Introduced a novel grammar induction algorithm, built on top of the λ-hybrid tree model that models the joint generative process of both λ-expressions and natural language texts. 26/26
Driving Semantic Parsing from the World s Response
Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationIntroduction to Semantic Parsing with CCG
Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)
More informationLecture 5: Semantic Parsing, CCGs, and Structured Classification
Lecture 5: Semantic Parsing, CCGs, and Structured Classification Kyle Richardson kyle@ims.uni-stuttgart.de May 12, 2016 Lecture Plan paper: Zettlemoyer and Collins (2012) general topics: (P)CCGs, compositional
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationCross-lingual Semantic Parsing
Cross-lingual Semantic Parsing Part I: 11 Dimensions of Semantic Parsing Kilian Evang University of Düsseldorf 1 / 94 Abbreviations NL natural language e.g., English, Bulgarian NLU natural language utterance
More informationAnalysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationNLU: Semantic parsing
NLU: Semantic parsing Adam Lopez slide credits: Chris Dyer, Nathan Schneider March 30, 2018 School of Informatics University of Edinburgh alopez@inf.ed.ac.uk Recall: meaning representations Sam likes Casey
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationApplications of Tree Automata Theory Lecture VI: Back to Machine Translation
Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationStructure and Complexity of Grammar-Based Machine Translation
Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection
More informationLecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1
Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training
More informationKnowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom.
Knowledge representation Introduction Knowledge is the progression that starts with data which s limited utility. Data when processed become information, information when interpreted or evaluated becomes
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationMachine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples
Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned
More informationBringing machine learning & compositional semantics together: central concepts
Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding
More informationSyntax-based Statistical Machine Translation
Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I Part II - Introduction - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical
More informationA Discriminative Model for Semantics-to-String Translation
A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More informationIntroduction to Semantics. Common Nouns and Adjectives in Predicate Position 1
Common Nouns and Adjectives in Predicate Position 1 (1) The Lexicon of Our System at Present a. Proper Names: [[ Barack ]] = Barack b. Intransitive Verbs: [[ smokes ]] = [ λx : x D e. IF x smokes THEN
More informationLanguage Model Rest Costs and Space-Efficient Storage
Language Model Rest Costs and Space-Efficient Storage Kenneth Heafield Philipp Koehn Alon Lavie Carnegie Mellon, University of Edinburgh July 14, 2012 Complaint About Language Models Make Search Expensive
More informationStatistical NLP Spring Corpus-Based MT
Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it
More informationCorpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1
Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See
More informationHuman-level concept learning through probabilistic program induction
B.M Lake, R. Salakhutdinov, J.B. Tenenbaum Human-level concept learning through probabilistic program induction journal club at two aspects in which machine learning spectacularly lags behind human learning
More informationPart I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions
Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationStatistical NLP Spring HW2: PNP Classification
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationHW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationNational Centre for Language Technology School of Computing Dublin City University
with with National Centre for Language Technology School of Computing Dublin City University Parallel treebanks A parallel treebank comprises: sentence pairs parsed word-aligned tree-aligned (Volk & Samuelsson,
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationExtensions to the Logic of All x are y: Verbs, Relative Clauses, and Only
1/53 Extensions to the Logic of All x are y: Verbs, Relative Clauses, and Only Larry Moss Indiana University Nordic Logic School August 7-11, 2017 2/53 An example that we ll see a few times Consider the
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationThe Geometry of Statistical Machine Translation
The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide
More informationLecture 7: Introduction to syntax-based MT
Lecture 7: Introduction to syntax-based MT Andreas Maletti Statistical Machine Translation Stuttgart December 16, 2011 SMT VII A. Maletti 1 Lecture 7 Goals Overview Tree substitution grammars (tree automata)
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationMultiple System Combination. Jinhua Du CNGL July 23, 2008
Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based tatistical Machine Translation Marcello Federico (based on slides by Gabriele Musillo) Human Language Technology FBK-irst 2011 Outline Tree-based translation models: ynchronous context
More informationEfficient Incremental Decoding for Tree-to-String Translation
Efficient Incremental Decoding for Tree-to-String Translation Liang Huang 1 1 Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292, USA
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationVariational Decoding for Statistical Machine Translation
Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1
More informationStatistical Ranking Problem
Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing
More informationNatural Language Processing (CSEP 517): Machine Translation
Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationLearning Dependency-Based Compositional Semantics
Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 0, 0 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein Motivating Problem:
More informationSemantic Parsing with Combinatory Categorial Grammars
Semantic Parsing with Combinatory Categorial Grammars Yoav Artzi, Nicholas FitzGerald and Luke Zettlemoyer University of Washington ACL 2013 Tutorial Sofia, Bulgaria Learning Data Learning Algorithm CCG
More informationSpeech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories
Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best
More informationOutline. Learning. Overview Details Example Lexicon learning Supervision signals
Outline Learning Overview Details Example Lexicon learning Supervision signals 0 Outline Learning Overview Details Example Lexicon learning Supervision signals 1 Supervision in syntactic parsing Input:
More informationGenerative Models for Sentences
Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationLogic and machine learning review. CS 540 Yingyu Liang
Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions.
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationOut of GIZA Efficient Word Alignment Models for SMT
Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza
More informationOverview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments
Overview Learning phrases from alignments 6.864 (Fall 2007) Machine Translation Part III A phrase-based model Decoding in phrase-based models (Thanks to Philipp Koehn for giving me slides from his EACL
More informationProbabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationA DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005
A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation
More informationDiscriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationAdvances in Abstract Categorial Grammars
Advances in Abstract Categorial Grammars Language Theory and Linguistic Modeling Lecture 3 Reduction of second-order ACGs to to Datalog Extension to almost linear second-order ACGs CFG recognition/parsing
More informationOn rigid NL Lambek grammars inference from generalized functor-argument data
7 On rigid NL Lambek grammars inference from generalized functor-argument data Denis Béchet and Annie Foret Abstract This paper is concerned with the inference of categorial grammars, a context-free grammar
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More informationA Stochastic l-calculus
A Stochastic l-calculus Content Areas: probabilistic reasoning, knowledge representation, causality Tracking Number: 775 Abstract There is an increasing interest within the research community in the design
More informationFast Consensus Decoding over Translation Forests
Fast Consensus Decoding over Translation Forests John DeNero Computer Science Division University of California, Berkeley denero@cs.berkeley.edu David Chiang and Kevin Knight Information Sciences Institute
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationUnit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing
CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...
More informationShift-Reduce Word Reordering for Machine Translation
Shift-Reduce Word Reordering for Machine Translation Katsuhiko Hayashi, Katsuhito Sudoh, Hajime Tsukada, Jun Suzuki, Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai,
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationDiscrimina)ve Latent Variable Models. SPFLODD November 15, 2011
Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs
More informationGlobal Machine Learning for Spatial Ontology Population
Global Machine Learning for Spatial Ontology Population Parisa Kordjamshidi, Marie-Francine Moens KU Leuven, Belgium Abstract Understanding spatial language is important in many applications such as geographical
More informationParts 3-6 are EXAMPLES for cse634
1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.
More informationShift-Reduce Word Reordering for Machine Translation
Shift-Reduce Word Reordering for Machine Translation Katsuhiko Hayashi, Katsuhito Sudoh, Hajime Tsukada, Jun Suzuki, Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai,
More informationTribhuvan University Institute of Science and Technology Micro Syllabus
Tribhuvan University Institute of Science and Technology Micro Syllabus Course Title: Discrete Structure Course no: CSC-152 Full Marks: 80+20 Credit hours: 3 Pass Marks: 32+8 Nature of course: Theory (3
More informationNotes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces
Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationLagrangian Relaxation Algorithms for Inference in Natural Language Processing
Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David
More informationMachine Translation without Words through Substring Alignment
Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3, Taro Watanabe 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 2 3 now at 1 Machine Translation Translate a source sentence F
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi
More informationPersonal Project: Shift-Reduce Dependency Parsing
Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce
More information