Aspects of Tree-Based Statistical Machine Translation
|
|
- Denis Manning
- 5 years ago
- Views:
Transcription
1 Aspects of Tree-Based tatistical Machine Translation Marcello Federico (based on slides by Gabriele Musillo) Human Language Technology FBK-irst 2011
2 Outline Tree-based translation models: ynchronous context free grammars BTG alignment model Hierarchical phrase-based model Decoding with CFGs: Translation as Parsing DP-based chart decoding Integration of language model scores Learning CFGs Rule extraction from phrase-tables
3 Tree-Based Translation Models Levels of Representation in Machine Translation: π π π π σ σ source target π σ: tree-to-string σ π: string-to-tree π π: tree-to-tree? Appropriate Levels of Representation?
4 Tree tructures NNP Pierre NP NNP MD Vinken will VP VP VB NP join Det the NN board yntactic tructures: rooted ordered trees internal nodes labeled with syntactic categories leaf nodes labeled with words linear and hierarchical relations between nodes
5 Tree-to-Tree Translation Models NNP Pierre NP Pierre NNP NNP Vinken NP Vinken NNP MD wird will MD VP NP Det dem join VB VP VP NN Vorstand the Det NP VP board NN VB beitreten syntactic generalizations over pairs of languages: isomorphic trees syntactically informed unbounded reordering formalized as derivations in synchronous grammars? Adequacy of Isomorphism Assumption?
6 Context-Free Grammars CFG (Chomsky, 1956): formal model of languages more expressive than FAs and REs first used in linguistics to describe embedded and recursive structures { 01 ɛ CFG Rules: left-hand side nonterminal symbol right-hand side string of nonterminal or terminal symbols distinguished start nonterminal symbol rewrites as 01 rewrites as ɛ
7 CFG Derivations Generative Process: 1. Write down the start nonterminal symbol. 2. Choose a rule whose left-hand side is the left-most written down nonterminal symbol. 3. Replace the symbol with the right-hand side of that rule. 4. Repeat step 2 while there are nonterminal symbols written down. Derivation ɛ Parse tree ɛ 1 1 1
8 CFG Formal Definitions CFG G = V, Σ, R, : V : finite set of nonterminal symbols Σ: finite set of terminals, disjoint from V R: finite set of rule α β, with α a nonterminal and β a string of terminals and nonterminals : the start nonterminal symbol Let u, v be strings of V Σ, and α β R, then we say:. uαv yields uβv, written as uαv uβv. u derives v, written as u v, if u = v or a sequence u 1, u 2,..., u k exists for k 0 and u u 1 u 2... u k v. L(G) = {w Σ w}
9 CFG Examples G 1 : G 3 : R = { NP VP, NP N DET N N PP, VP V NP V NP PP, PP P NP, DET the a, N Alice Bob trumpet, V chased, P with} R = {NP NP CONJ NP NP PP DET N, PP P NP, P of, DET the two three, N mother pianists singers, CONJ and}? derivations of the mother of three pianists and two singers? derivations of Alice chased Bob with the trumpet same parse tree can be derived in different ways ( order of rules) same sentence can have different parse trees ( choice of rules)
10 Transduction Grammars aka ynchronous Grammars TG (Lewis and tearns, 1968; Aho and Ullman, 1969): two or more strings derived simultaneously more powerful than FTs used in NLP to model alignments, unbounded reordering, and mappings from surface forms to logical forms E E [1] + E [3] / + E [1] E [3] ynchronous Rules: left-hand side nonterminal symbol associated with source and target right-hand sides bijection [] mapping nonterminals in source and target of right-hand sides infix to Polish notation E E [1] E [2] / E [1] E [2] E n / n n N
11 ynchronous CFG 1-to-1 correspondence between nodes isomorphic derivation trees uniquely determined word alignment
12 Bracketing Transduction Grammars BTG (Wu, 1997): special form of CFG only one nonterminal nonterminal rules: { [1] [2] / [1] [2] [1] [2] / [2] [1] monotone rule inversion rule preterminal rules where e V t {ɛ} and f V s {ɛ}: { f / e lexical translation rules
13 CFG Derivations Generative Process: 1. Write down the source and target start symbols. 2. Choose a synchronous rule whose left-hand side is the left-most written down source nonterminal symbol. 3. imultaneously rewrite the source symbol and its corresponding target symbol with the source and the target side of the rule, respectively. 4. Repeat step 2 while there are written down source and target nonterminal symbols. [1] [2] / [1] [2] [1] [2] / [2] [1] k / k k {1, 2, 3}
14 BTG Alignments, 1 2 / , / , re-indexed symbols 1/ , /2 12 2, /3 123,
15 Phrase-Based Models and CFGs CFG Formalization of Phrase-Based Translation Models: phrase pair f, ẽ, make rule f / ẽ make monotone rules [1] [2] / [1] [2] make reordering rules [1] / [1] [1] [2] / [2] [1]? Completeness? Correctness?
16 Hierarchical Phrase-Based Models HPBM (Chiang, 2007): first tree-to-tree approach to perform better than phrase-based systems in large-scale evaluations discontinuous phrases long-range reordering rules formalized as synchronous context-free grammars
17 HPBM: Motivations Typical Phrase-Based Chinese-English Translation: Chinese VPs follow PPs / English VPs precede PPs yu 1 you 2 / have 2 with 1 Chinese NPs follow RCs / English NPs precede RCs 1 de 2 / the 2 that 1 translation of zhiyi construct in English word order 1 zhiyi / one of 1
18 HPBM: Example Rules 1 / 1 (1) 1 2 / 1 2 (2) yu 1 you 2 / have 2 with 1 (3) 1 de 2 / the 2 that 1 (4) 1 zhiyi / one of 1 (5) Aozhou / Australia (6) Beihan / N. Korea (7) she / is (8) bangjiao / dipl.rels. (9) shaoshu guojia / few countries (10)
19 HPBM: Example Translation tep / 1 2
20 HPBM: Example Translation tep / 1 2
21 HPBM: Example Translation tep 3 1 / 1
22 HPBM: Example Translation tep 4 Aozhou / Australia
23 HPBM: Example Translation tep 5 Aozhou Australia she / is
24 HPBM: Example Translation tep 6 shi Aozhou is Australia 1 zhiyi / one of 1
25 HPBM: Example Translation tep 7 zhiyi shi Aozhou one of is Australia 1 de 2 / the 2 that 1
26 HPBM: Example Translation tep 8 zhiyi shi de Aozhou one of is the that Australia yu 1 you 2 / have 2 with 1
27 HPBM: Example Translation tep 9 zhiyi shi de Aozhou yu you one of is the that Australia have with Beihan / N. Korea
28 HPBM: Example Translation tep 10 zhiyi shi de Aozhou yu Beihan you one of is the that Australia have with N. Korea bangjiao / dipl.rels.
29 HPBM: Example Translation tep 11 zhiyi shi de Aozhou yu you Beihan bangjiao one of is the that Australia have with dipl.rels. N. Korea shaoshu guojia / few countries
30 HPBM: Example Translation zhiyi shi de Aozhou yu you shaoshu guojia Beihan bangjiao one of is the that Australia few countries have with dipl.rels. N. Korea
31 ummary ynchronous Context-Free Grammars: formal model to synchronize source and target derivation processes BTG alignment model HPB recursive reordering model Additional topics (optional): Decoding CFGs: Translation as Parsing Learning CFGs from phrase tables
32 ynchronous Context-Free Grammars CFGs: CFGs in two dimensions synchronous derivation of isomorphic a trees unbounded reordering preserving hierarchy a excluding leafs VB PRP 1 VB1 2 VB2 3 / PRP 1 VB2 3 VB1 2 VB2 VB 1 TO 2 / TO 2 VB 1 ga TO TO 1 NN 2 / NN 2 TO 1 PRP he / kare ha VB listening / daisuki desu VB 1 PRP 2 he VB 1 VB1 3 VB2 4 adores VB 5 TO 6 listening TO 7 NN 8 PRP 2 kare ha VB2 4 TO 6 VB 5 NN 8 TO 7 kiku no ongaku wo ga VB1 3 daisuki desu to music
33 Weighted CFGs rules A α / β associated with positive weights w A α/β derivation trees π = π 1, π 2 weighted as W(π) = A α/β G w f A α/β(π) A α/β probabilistic CFGs if the following conditions hold w A α/β [0, 1] and α,β W A α/β = 1 notice: CFGs might well include rules of type A α/β 1... A α/β k
34 MAP Translation Problem Maximum A Posterior Translation: e = argmax e = argmax e p(e f ) π Π(f,e) p(e, π f ) Π(f, e) is the set of synchronous derivation trees yielding f, e Exact MAP decoding is NP-hard (imaan, 1996; atta and Peserico, 2005)
35 Viterbi Approximation Tractable Approximate Decoding: e = argmax e argmax e π Π(f,e) max π Π(f,e) = E(argmax p(π)) π Π(f ) p(e, π f ) p(e, π f ) Π(f ) is the set of synchronous derivations yielding f E(π) is the target string resulting from the synchronous derivation π
36 Translation as Parsing Parsing olution: π = argmax p(π) π Π(f ) 1. compute the most probable derivation tree that generates f using the source dimension of the WCFG 2. build the translation string e by applying the target dimension of the rules used in the most probable derivation most probable derivation computed in O(n 3 ) using dynamic programming algorithms for parsing weighted CFGs transfer of algorithms and optimizations developed for CFG to MT
37 Translation as Parsing: Illustration
38 Weighted CFGs in Chomsky Normal Form WCFGs: rules A α associated with positive weights w A α derivation trees π weighted as W(π) = A α G w f A α(π) A α probabilistic CFGs if the following conditions hold w A α [0, 1] and α w A α = 1 WCFGs in CNF: rules in CFGs in Chomsky Normal Form: A BC or A a equivalence between WCFGs and WCFGs in CNF no analogous equivalence holds for weighted CFGs
39 Translation as Weighted CKY Parsing Given a WCFG G and a source string f: 1. project G into its source WCFG G A w α G if A w α/β G and A w α/β G w w 2. transform G into its CNF G 3. solve π = argmax π Π G (f ) p(π ) with the CKY algorithm 4. revert π into π, the derivation tree according to G 5. map π into its corresponding target tree π 6. read off the translation e from π
40 Weighted CKY Parsing Dynamic Programming: recursive division of problems into subproblems optimal solutions compose optimal sub-solutions (Bellman s Principle) tabulation of subproblems and their solutions CKY Parsing: subproblems: parsing substrings of the input string u 1... u n solutions to subproblems tabulated using a chart bottom up algorithm starting with derivation of terminals O(n 3 G ) time complexity widely used to perform statistical inference over random trees
41 Weighted CKY Parsing Problem: π = argmax π ΠG (u=u 1...u n) p(π) DP chart: M i,k,a = maximum probability of A u i+1,k base case, k i = 1: inductive case, k i > 1: M i,k,a = 1 i n M i 1,i,A = w A ui ) max {w A B C M i,j,b M j,k,c } B,C,i<j<k best derivation built by storing B, C, and j for each M i,k,a
42 Weighted CKY Parsing M i,k,a = max {w A B C M i,j,b M j,k,c } B,C,i<j<k A B C u i+1,j u j+1,k
43 Weighted CKY Pseudo-Code 1: A, 0 i, j n M i,j,a = 0; 2: for i = 1 to n do {base case: substrings of length 1} 3: M i 1,i,A = w A ui ; {inductive case: substrings of length > 1} 4: for l = 2 to n do {l: length of the subtring} 5: for i = 0 to n l do {i: start position of the substring} 6: k = i + l; {k: end position of the substring} 7: for j = i + 1 to k 1 do {j: split position} 8: for A BC do 9: q = w A BC M i,j,b M j,k,c ; 10: if q > M i,k,a then 11: M i,k,a = q; 12: {backpointers to build derivation tree} D i,k,a = j, B, C ;
44 Parsing CFG and Language Modelling Viterbi Decoding of WCFGs: focus on most probable derivation of source (ignoring different target sides associated with the same source side) derivation weights do not include language models scores? HOW TO EFFICIENTLY COMPUTE TARGET LANGUAGE MODEL CORE FOR POIBLE DERIVATION? Approaches: 1. rescoring: generate k-best candidate translations and rerank k-best list with LM 2. online: integrate target m-gram LM scores into dynamic programming parsing 3. cube pruning (Huang and Chiang, 2007): rescore k-best sub-translations at each node of the parse forest
45 Online Translation Online Translation: parsing of the source string and building of the corresponding subtranslations in parallel PP 1,3 : (w 1, t 1 ) VP 3,6 : (w 2, t 2 ) VP 1,6 : (w w 1 w 2, t 2 t 1 ) w 1, w 2 : weights of the two antecedents w: weight of the synchronous rule t 1, t 2 : translations
46 LM Online Integration (Wu, 1996) Bigram Online Integration: PP with haron 1,3 : (w 1, t 1 ) VP held talk 3,6 : (w 2, t 2 ) VP held haron 1,6 : (w w 1 w 2 p LM (with talk), t 2 t 1 )
47 Cube Pruning (Huang and Chiang, 2007) Beam earch: at each step in the derivation, keep at most k items integrating target subtranslations in a beam enumerate all possible combinations of LM items extract the k-best combinations Cube Pruning: get k-best LM items without without computing all possible combinations approximate beam search: prone to search errors (in practice, much less significant than efficient decoding)
48 Cube Pruning Heuristic Assumption: best adjacent items lie towards the upper-left corner part of the grid can be pruned without computing its cells
49 Cube Pruning: Example
50 Cube Pruning: Example
51 Cube Pruning: Example
52 Cube Pruning: Example
53 Cube Pruning: Pseudo-Code To efficiently compute a small corner of the grid: push cost of grid cell 1, 1 onto priority queue repeat j 1. extract best cell from queue 2. push costs of best cell s neighbours onto queue until k cells have been extracted (other termination conditions are possible)
54 ummary Translation As Parsing: Viterbi Approximation Weighted CKY Parsing Online LM Integration and Cube Pruning Next ession: Learning CFGs and Hiero
55 Hierarchical Phrase-Based Models Hiero (Chiang, 2005, 2007): CFG of rank 2 with only two nonterminal symbols discontinuous phrases long-range reordering rules
56 Hiero ynchronous Rules Rule Extraction: a word-aligned sentence pair b extract initial phrase pairs c replace sub-phrases in phrases with symbol Glue Rules: 1 2 / / 1 Rule Filtering: limited length of initial phrases no adjacent nonterminals on source at least one pair of aligned words in non-glue rules
57 Hiero: Rule Extraction
58 Hiero: Rule Extraction
59 Hiero: Rule Extraction
60 Hiero: Rule Extraction
61 Hiero: Rule Extraction
62 Hiero: Rule Extraction
63 Hiero: Rule Extraction
64 Hiero: Log-Linear Parametrization coring Rules: (A γ) = λ h h(a γ): feature representation vector R m λ: weight vector R m h r (A γ): value of the r-th feature λ r : weight of the r-th feature coring Derivations: (π) = λ LM log p(e(π)) + (Z γ) Z γ,i,j π derivation scores decompose into sum of rule scores p(e(π)) is the LM score computed while parsing
65 Hiero: Feature Representation Word Translation Features: h 1 ( α/β) = log p(t β T α ) h 2 ( α/β) = log p(t α T β ) Word Penalty Feature: h 3 ( α/β) = T β ynchronous Features: h 4 ( α/β) = log p(β α) h 5 ( α/β) = log p(α β) Glue Penalty Feature: h 6 ( 1 1 / 1 1 ) = 1 Phrase Penalty Feature: h 7 ( α/β) = 1 λ i tuned on dev set using MERT
66 ummary Hiero: Rule Extraction Log-Linear Parametrization Feature Representation
Aspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationStructure and Complexity of Grammar-Based Machine Translation
Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationSynchronous Grammars
ynchronous Grammars ynchronous grammars are a way of simultaneously generating pairs of recursively related strings ynchronous grammar w wʹ ynchronous grammars were originally invented for programming
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationUnit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing
CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationCKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016
CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:
More informationSyntax-based Statistical Machine Translation
Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I Part II - Introduction - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical
More informationSyntax-Based Decoding
Syntax-Based Decoding Philipp Koehn 9 November 2017 1 syntax-based models Synchronous Context Free Grammar Rules 2 Nonterminal rules NP DET 1 2 JJ 3 DET 1 JJ 3 2 Terminal rules N maison house NP la maison
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationPart I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions
Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationProbabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationParsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26
Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More informationParsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing
L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the
More informationParsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.
: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationContext- Free Parsing with CKY. October 16, 2014
Context- Free Parsing with CKY October 16, 2014 Lecture Plan Parsing as Logical DeducBon Defining the CFG recognibon problem BoHom up vs. top down Quick review of Chomsky normal form The CKY algorithm
More informationParsing. Weighted Deductive Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26
Parsing Weighted Deductive Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Idea 2 Algorithm 3 CYK Example 4 Parsing 5 Left Corner Example 2 / 26
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationGrammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG
Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing
More informationHierarchies of Tree Series TransducersRevisited 1
Hierarchies of Tree Series TransducersRevisited 1 Andreas Maletti 2 Technische Universität Dresden Fakultät Informatik June 27, 2006 1 Financially supported by the Friends and Benefactors of TU Dresden
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationAnalysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationProbabilistic Context Free Grammars
1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set
More informationIntroduction to Computational Linguistics
Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationLanguages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:
Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationRecurrent neural network grammars
Widespread phenomenon: Polarity items can only appear in certain contexts Recurrent neural network grammars lide credits: Chris Dyer, Adhiguna Kuncoro Example: anybody is a polarity item that tends to
More informationProbabilistic Context-Free Grammars and beyond
Probabilistic Context-Free Grammars and beyond Mark Johnson Microsoft Research / Brown University July 2007 1 / 87 Outline Introduction Formal languages and Grammars Probabilistic context-free grammars
More informationEinführung in die Computerlinguistik
Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP
More informationPCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities
1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside- 2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence
More informationPrefix Probability for Probabilistic Synchronous Context-Free Grammars
Prefix Probability for Probabilistic Synchronous Context-Free rammars Mark-Jan Nederhof School of Computer Science University of St Andrews North Haugh, St Andrews, Fife KY16 9SX United Kingdom markjan.nederhof@googlemail.com
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationNPDA, CFG equivalence
NPDA, CFG equivalence Theorem A language L is recognized by a NPDA iff L is described by a CFG. Must prove two directions: ( ) L is recognized by a NPDA implies L is described by a CFG. ( ) L is described
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationExam: Synchronous Grammars
Exam: ynchronous Grammars Duration: 3 hours Written documents are allowed. The numbers in front of questions are indicative of hardness or duration. ynchronous grammars consist of pairs of grammars whose
More informationNotes for Comp 497 (Comp 454) Week 10 4/5/05
Notes for Comp 497 (Comp 454) Week 10 4/5/05 Today look at the last two chapters in Part II. Cohen presents some results concerning context-free languages (CFL) and regular languages (RL) also some decidability
More informationComputational Models - Lecture 4
Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push
More informationSuppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis
1 Introduction Parenthesis Matching Problem Describe the set of arithmetic expressions with correctly matched parenthesis. Arithmetic expressions with correctly matched parenthesis cannot be described
More informationComputational Models - Lecture 4 1
Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice
More informationProperties of Context-Free Languages
Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationRemembering subresults (Part I): Well-formed substring tables
Remembering subresults (Part I): Well-formed substring tables Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01, 1. February 2005 Problem: Inefficiency of recomputing subresults Two
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationThe Power of Tree Series Transducers
The Power of Tree Series Transducers Andreas Maletti 1 Technische Universität Dresden Fakultät Informatik June 15, 2006 1 Research funded by German Research Foundation (DFG GK 334) Andreas Maletti (TU
More informationCMPT-825 Natural Language Processing. Why are parsing algorithms important?
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate
More informationOutline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.
Outline CS21 Decidability and Tractability Lecture 5 January 16, 219 and Languages equivalence of NPDAs and CFGs non context-free languages January 16, 219 CS21 Lecture 5 1 January 16, 219 CS21 Lecture
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationEven More on Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 Even More on Dynamic Programming Lecture 15 Thursday, October 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 26 Part I Longest Common Subsequence
More informationCross-Entropy and Estimation of Probabilistic Context-Free Grammars
Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Anna Corazza Department of Physics University Federico II via Cinthia I-8026 Napoli, Italy corazza@na.infn.it Giorgio Satta Department
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationCS 188 Introduction to AI Fall 2005 Stuart Russell Final
NAME: SID#: Section: 1 CS 188 Introduction to AI all 2005 Stuart Russell inal You have 2 hours and 50 minutes. he exam is open-book, open-notes. 100 points total. Panic not. Mark your answers ON HE EXAM
More informationEinführung in die Computerlinguistik
Einführung in die Computerlinguistik Context-Free Grammars formal properties Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2018 1 / 20 Normal forms (1) Hopcroft and Ullman (1979) A normal
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationContext Free Grammars
Automata and Formal Languages Context Free Grammars Sipser pages 101-111 Lecture 11 Tim Sheard 1 Formal Languages 1. Context free languages provide a convenient notation for recursive description of languages.
More informationContext-Free Grammars and Languages
Context-Free Grammars and Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationNotes for Comp 497 (454) Week 10
Notes for Comp 497 (454) Week 10 Today we look at the last two chapters in Part II. Cohen presents some results concerning the two categories of language we have seen so far: Regular languages (RL). Context-free
More informationLagrangian Relaxation Algorithms for Inference in Natural Language Processing
Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David
More informationAttendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.
even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,
More informationParsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21
Parsing Unger s Parser Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2016/17 1 / 21 Table of contents 1 Introduction 2 The Parser 3 An Example 4 Optimizations 5 Conclusion 2 / 21 Introduction
More informationHarvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars
Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars Salil Vadhan October 2, 2012 Reading: Sipser, 2.1 (except Chomsky Normal Form). Algorithmic questions about regular
More informationNLP Programming Tutorial 11 - The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationFoundations of Informatics: a Bridging Course
Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html
More informationIntroduction to Theory of Computing
CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages
More informationMA/CSSE 474 Theory of Computation
MA/CSSE 474 Theory of Computation CFL Hierarchy CFL Decision Problems Your Questions? Previous class days' material Reading Assignments HW 12 or 13 problems Anything else I have included some slides online
More informationRelating Probabilistic Grammars and Automata
Relating Probabilistic Grammars and Automata Steven Abney David McAllester Fernando Pereira AT&T Labs-Research 80 Park Ave Florham Park NJ 07932 {abney, dmac, pereira}@research.att.com Abstract Both probabilistic
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationA Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus
A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A. D. Fowler Department of Computer Science University of Toronto 10 King s College Rd., Toronto, ON, M5S 3G4, Canada
More informationThe Infinite PCFG using Hierarchical Dirichlet Processes
S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise
More informationDiscriminative Training. March 4, 2014
Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =
More informationHarvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition
Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition Salil Vadhan October 11, 2012 Reading: Sipser, Section 2.3 and Section 2.1 (material on Chomsky Normal Form). Pumping Lemma for
More informationCPS 220 Theory of Computation
CPS 22 Theory of Computation Review - Regular Languages RL - a simple class of languages that can be represented in two ways: 1 Machine description: Finite Automata are machines with a finite number of
More informationComputational Models - Lecture 4 1
Computational Models - Lecture 4 1 Handout Mode Iftach Haitner. Tel Aviv University. November 21, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.
More informationNatural Language Processing. Lecture 13: More on CFG Parsing
Natural Language Processing Lecture 13: More on CFG Parsing Probabilistc/Weighted Parsing Example: ambiguous parse Probabilistc CFG Ambiguous parse w/probabilites 0.05 0.05 0.20 0.10 0.30 0.20 0.60 0.75
More informationBringing machine learning & compositional semantics together: central concepts
Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding
More informationObject Detection Grammars
Object Detection Grammars Pedro F. Felzenszwalb and David McAllester February 11, 2010 1 Introduction We formulate a general grammar model motivated by the problem of object detection in computer vision.
More informationParsing Linear Context-Free Rewriting Systems
Parsing Linear Context-Free Rewriting Systems Håkan Burden Dept. of Linguistics Göteborg University cl1hburd@cling.gu.se Peter Ljunglöf Dept. of Computing Science Göteborg University peb@cs.chalmers.se
More informationNatural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation
atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing
More information