Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1
|
|
- Derek Walsh
- 5 years ago
- Views:
Transcription
1 Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1
2 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training (EM) This time Inside and outside weights n-best derivations Cube pruning Factorization SMT VIII A. Maletti 2
3 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training (EM) This time Inside and outside weights (again) n-best derivations Cube pruning Factorization SMT VIII A. Maletti 2
4 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training (EM) This time Inside and outside weights n-best derivations Cube pruning Factorization SMT VIII A. Maletti 2
5 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training (EM) This time Inside and outside weights n-best derivations Cube pruning Factorization SMT VIII A. Maletti 2
6 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training (EM) This time Inside and outside weights n-best derivations Cube pruning Factorization SMT VIII A. Maletti 2
7 Contents 1 Inside and Outside Weights 2 n-best Derivations 3 Cube Pruning 4 Factorization SMT VIII A. Maletti 3
8 State Metrics of a Weighted Tree Grammar Given weighted tree grammar G = (Q, Σ, I, R) D q M : derivations in M starting from state q Definition Inside weight of q Q: in(q) = d D q M wt(d) Outside weight of q Q: out(q) = t C Σ ({q}) d D M (t) wt(d) SMT VIII A. Maletti 4
9 Inside Weight Question Why did nobody complain about the infinite sum? in(q) = wt(d) d D q M SMT VIII A. Maletti 5
10 Computing Inside Weights Theorem in(q) = c in(q 1 )... in(q k ) q c σ(q 1,...,q k ) R Problem But that is not a well-founded recursion! Turn it into a system of equations 0 = in(q) + c in(q 1 )... in(q k ) 0 = in(p) + Q variables and Q equations q c σ(q 1,...,q k ) R SMT VIII A. Maletti 6
11 Computing Inside Weights Theorem in(q) = c in(q 1 )... in(q k ) q c σ(q 1,...,q k ) R Problem But that is not a well-founded recursion! Turn it into a system of equations 0 = in(q) + c in(q 1 )... in(q k ) 0 = in(p) + Q variables and Q equations q c σ(q 1,...,q k ) R SMT VIII A. Maletti 6
12 Computing Inside Weights Theorem in(q) = c in(q 1 )... in(q k ) q c σ(q 1,...,q k ) R Problem But that is not a well-founded recursion! Turn it into a system of equations 0 = in(q) + c in(q 1 )... in(q k ) 0 = in(p) + Q variables and Q equations q c σ(q 1,...,q k ) R SMT VIII A. Maletti 6
13 Solving a system of equations Q variables and Q equations 0 = in(q) + c in(q 1 )... in(q k ) q c σ(q 1,...,q k ) R 0 = in(p) + Question Is it a linear system of equations? Approximation Use any method for approximating zero-points of functions Regula Falsi Tangent method (NEWTON-RAPHSON) Secant method... SMT VIII A. Maletti 7
14 Solving a system of equations Q variables and Q equations 0 = in(q) + c in(q 1 )... in(q k ) 0 = in(p) + q c σ(q 1,...,q k ) R Question Is it a linear system of equations? NO! Approximation Use any method for approximating zero-points of functions Regula Falsi Tangent method (NEWTON-RAPHSON) Secant method... SMT VIII A. Maletti 7
15 Solving a system of equations Q variables and Q equations 0 = in(q) + c in(q 1 )... in(q k ) 0 = in(p) + q c σ(q 1,...,q k ) R Question Is it a linear system of equations? NO! Approximation Use any method for approximating zero-points of functions Regula Falsi Tangent method (NEWTON-RAPHSON) Secant method... SMT VIII A. Maletti 7
16 Newton-Raphson method Algorithm 1 Select initial point p 2 Determine tangent to curve at p 3 Determine zero-point p of tangent 4 Go back to 2 SMT VIII A. Maletti 8
17 Newton-Raphson Iteration SMT VIII A. Maletti 9
18 Newton-Raphson Iteration SMT VIII A. Maletti 9
19 Newton-Raphson Iteration SMT VIII A. Maletti 9
20 Newton-Raphson Iteration SMT VIII A. Maletti 9
21 Newton-Raphson Iteration SMT VIII A. Maletti 9
22 Newton-Raphson Iteration SMT VIII A. Maletti 9
23 Newton-Raphson Iteration SMT VIII A. Maletti 9
24 Newton-Raphson Iteration SMT VIII A. Maletti 9
25 Newton-Raphson Iteration SMT VIII A. Maletti 9
26 Newton-Raphson Iteration SMT VIII A. Maletti 9
27 Newton-Raphson Iteration SMT VIII A. Maletti 9
28 Newton-Raphson Iteration SMT VIII A. Maletti 9
29 Newton-Raphson Iteration SMT VIII A. Maletti 9
30 Newton-Raphson Iteration SMT VIII A. Maletti 9
31 Newton-Raphson Iteration SMT VIII A. Maletti 9
32 Newton-Raphson Iteration SMT VIII A. Maletti 9
33 Newton-Raphson Iteration SMT VIII A. Maletti 9
34 Convergence Speed given a good initial point converges quadratically to the solution (number of correct decimal points doubles with each iteration) But Convergence is not guaranteed Initial point needs to be reasonably close to zero-point SMT VIII A. Maletti 10
35 Convergence Speed given a good initial point converges quadratically to the solution (number of correct decimal points doubles with each iteration) But Convergence is not guaranteed Initial point needs to be reasonably close to zero-point SMT VIII A. Maletti 10
36 Convergence SMT VIII A. Maletti 10
37 Contents 1 Inside and Outside Weights 2 n-best Derivations 3 Cube Pruning 4 Factorization SMT VIII A. Maletti 11
38 n-best Derivations Problem Given WTG G = (Q, Σ, I, R) and q Q Compute the n highest scoring derivations of D q M Caveats Semantics is given by wt(t) = ( ) wt(d) q I d D q M (t) We disregard the actual input We disregard the summation SMT VIII A. Maletti 12
39 n-best Derivations Problem Given WTG G = (Q, Σ, I, R) and q Q Compute the n highest scoring derivations of D q M Caveats Semantics is given by wt(t) = ( ) wt(d) q I d D q M (t) We disregard the actual input We disregard the summation SMT VIII A. Maletti 12
40 n-best Derivations Problem Given WTG G = (Q, Σ, I, R) and q Q Compute the n highest scoring derivations of D q M Caveats Semantics is given by wt(t) = ( ) wt(d) q I d D q M (t) We disregard the actual input Use product construction to restrict to actual input We disregard the summation SMT VIII A. Maletti 12
41 n-best Derivations Problem Given WTG G = (Q, Σ, I, R) and q Q Compute the n highest scoring derivations of D q M Caveats Semantics is given by wt(t) = ( ) wt(d) q I d D q M (t) We disregard the actual input Use product construction to restrict to actual input We disregard the summation No Fix! SMT VIII A. Maletti 12
42 Viterbi Algorithm Algorithm 1 Sort states topologically 2 Select least unprocessed state q 3 Determine its best derivation (based on previous states) 4 Mark q and return to 2. Requirement Optimal substructure property (dynamic programming) a i b i implies f (a 1,..., a k ) f (b 1,..., b k ) in our case: a i b i must imply k i=1 a i k i=1 b i SMT VIII A. Maletti 13
43 Viterbi Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 SMT VIII A. Maletti 14
44 Viterbi Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 Best derivations: 1 α 0.7 SMT VIII A. Maletti 14
45 Viterbi Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 Best derivations: 1 α β 0.8 SMT VIII A. Maletti 14
46 Viterbi Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 Best derivations: 1 α β σ(α, β) SMT VIII A. Maletti 14
47 Viterbi Algorithm Exercise (Taken from KAESLIN: A gentle introduction to dynamic programming) SMT VIII A. Maletti 15
48 Viterbi Algorithm Suitable weight structures (R 0, +,, 0, 1) (R, +,, 0, 1) ({0, 1}, max, min, 0, 1) (N, +,, 0, 1) ([0, 1], max,, 0, 1) Extension Can we handle cyclic WTG? additional requirement: f (a 1,..., a k ) a i or in our case k a i a i i=1 SMT VIII A. Maletti 16
49 Viterbi Algorithm Suitable weight structures (R 0, +,, 0, 1) YES (R, +,, 0, 1) ({0, 1}, max, min, 0, 1) (N, +,, 0, 1) ([0, 1], max,, 0, 1) Extension Can we handle cyclic WTG? additional requirement: f (a 1,..., a k ) a i or in our case k a i a i i=1 SMT VIII A. Maletti 16
50 Viterbi Algorithm Suitable weight structures (R 0, +,, 0, 1) YES (R, +,, 0, 1) NO ({0, 1}, max, min, 0, 1) (N, +,, 0, 1) ([0, 1], max,, 0, 1) Extension Can we handle cyclic WTG? additional requirement: f (a 1,..., a k ) a i or in our case k a i a i i=1 SMT VIII A. Maletti 16
51 Viterbi Algorithm Suitable weight structures (R 0, +,, 0, 1) YES (R, +,, 0, 1) NO ({0, 1}, max, min, 0, 1) YES (N, +,, 0, 1) ([0, 1], max,, 0, 1) Extension Can we handle cyclic WTG? additional requirement: f (a 1,..., a k ) a i or in our case k a i a i i=1 SMT VIII A. Maletti 16
52 Viterbi Algorithm Suitable weight structures (R 0, +,, 0, 1) YES (R, +,, 0, 1) NO ({0, 1}, max, min, 0, 1) YES (N, +,, 0, 1) YES ([0, 1], max,, 0, 1) Extension Can we handle cyclic WTG? additional requirement: f (a 1,..., a k ) a i or in our case k a i a i i=1 SMT VIII A. Maletti 16
53 Viterbi Algorithm Suitable weight structures (R 0, +,, 0, 1) YES (R, +,, 0, 1) NO ({0, 1}, max, min, 0, 1) YES (N, +,, 0, 1) YES ([0, 1], max,, 0, 1) YES Extension Can we handle cyclic WTG? additional requirement: f (a 1,..., a k ) a i or in our case k a i a i i=1 SMT VIII A. Maletti 16
54 Viterbi Algorithm Suitable weight structures (R 0, +,, 0, 1) YES (R, +,, 0, 1) NO ({0, 1}, max, min, 0, 1) YES (N, +,, 0, 1) YES ([0, 1], max,, 0, 1) YES Extension Can we handle cyclic WTG? additional requirement: f (a 1,..., a k ) a i or in our case k a i a i i=1 SMT VIII A. Maletti 16
55 Viterbi Algorithm Suitable weight structures (R 0, +,, 0, 1) YES (R, +,, 0, 1) NO ({0, 1}, max, min, 0, 1) YES (N, +,, 0, 1) YES ([0, 1], max,, 0, 1) YES Extension Can we handle cyclic WTG? additional requirement: YES f (a 1,..., a k ) a i or in our case k a i a i i=1 SMT VIII A. Maletti 16
56 n-best Algorithm Observations Best subderivations yield the best derivation How do we obtain the 2nd best derivation? SMT VIII A. Maletti 17
57 n-best Algorithm Observations Best subderivations yield the best derivation How do we obtain the 2nd best derivation? SMT VIII A. Maletti 17
58 n-best Algorithm Observations Best subderivations yield the best derivation How do we obtain the 2nd best derivation? SMT VIII A. Maletti 17
59 n-best Algorithm Observations Best subderivations yield the best derivation How do we obtain the 2nd best derivation? SMT VIII A. Maletti 17
60 n-best Algorithm Observations Best subderivations yield the best derivation How do we obtain the 2nd best derivation? SMT VIII A. Maletti 17
61 n-best Algorithm Observations Best subderivations yield the best derivation How do we obtain the 2nd best derivation? SMT VIII A. Maletti 17
62 n-best Algorithm Observations Best subderivations yield the best derivation How do we obtain the 2nd best derivation? SMT VIII A. Maletti 17
63 n-best Algorithm Optimization lazy computation of values yields O( E + D max n log n) E : number of transitions D max : longest derivation in top n n: desired number of derivations SMT VIII A. Maletti 18
64 3-best Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 1st derivation: σ(α, β) with weight = nd derivation: SMT VIII A. Maletti 19
65 3-best Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 1st derivation: σ(α, β) with weight = nd derivation: best of β 0.3 σ 0.8 with 2nd best from 1 and best from 2 σ 0.8 with best from 1 and 2nd best from 2 SMT VIII A. Maletti 19
66 3-best Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 1st derivation: σ(α, β) with weight = nd derivation: best of β σ 0.8 with 2nd best from 1 and best from 2 σ 0.8 with best from 1 and 2nd best from 2 SMT VIII A. Maletti 19
67 3-best Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 1st derivation: σ(α, β) with weight = nd derivation: best of β σ 0.8 with 2nd best from 1 and best from σ 0.8 with best from 1 and 2nd best from 2 SMT VIII A. Maletti 19
68 3-best Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 1st derivation: σ(α, β) with weight = nd derivation: best of β σ 0.8 with 2nd best from 1 and best from σ 0.8 with best from 1 and 2nd best from SMT VIII A. Maletti 19
69 3-best Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 1st derivation: σ(α, β) with weight = nd derivation: β with weight 0.3 = 0.3 3rd derivation: σ 0.8 with 2nd best from 1 and best from σ 0.8 with best from 1 and 2nd best from SMT VIII A. Maletti 19
70 3-best Algorithm 3 σ α 0.7 β 0.3 α 0.2 β 0.8 1st derivation: σ(α, β) with weight = nd derivation: β with weight 0.3 = 0.3 3rd derivation: σ(β, β) with weight = SMT VIII A. Maletti 19
71 5-best Algorithm Exercise SMT VIII A. Maletti 20
72 Contents 1 Inside and Outside Weights 2 n-best Derivations 3 Cube Pruning 4 Factorization SMT VIII A. Maletti 21
73 Cube Pruning Cube Pruning = n-best + language model SMT VIII A. Maletti 22
74 Cube Pruning Syntax-based systems Input Parser Machine translation system Language model Output SMT VIII A. Maletti 23
75 Cube Pruning Syntax-based systems Input Parser Machine translation system Language model Output Notes We can individually compute n-best lists But that leads to large search errors Slightly better: 10 n+2 -best list and 10 n -best list SMT VIII A. Maletti 23
76 Cube Pruning Syntax-based systems Input Parser Machine translation system Language model Output Notes We can individually compute n-best lists But that leads to large search errors Slightly better: 10 n+2 -best list and 10 n -best list SMT VIII A. Maletti 23
77 Cube Pruning Syntax-based systems Input Parser Machine translation system Language model Output Alternatives Full integration (Intersection) Pruning and beam search Cube pruning SMT VIII A. Maletti 23
78 Cube Pruning Syntax-based systems Input Parser Machine translation system Language model Output Alternatives Full integration (Intersection) Pruning and beam search Cube pruning too expensive SMT VIII A. Maletti 23
79 Cube Pruning Syntax-based systems Input Parser Machine translation system Language model Output Alternatives Full integration (Intersection) Pruning and beam search Cube pruning too expensive SMT VIII A. Maletti 23
80 Cube Pruning Syntax-based systems Input Parser Machine translation system Language model Output Alternatives Full integration (Intersection) Pruning and beam search Cube pruning too expensive too bad SMT VIII A. Maletti 23
81 Cube Pruning Syntax-based systems Input Parser Machine translation system Language model Output Alternatives Full integration (Intersection) Pruning and beam search Cube pruning too expensive too bad SMT VIII A. Maletti 23
82 Cube Pruning Algorithm Proceed as for n-best but multiply the language model score in the end SMT VIII A. Maletti 24
83 Cube Pruning Algorithm Proceed as for n-best but multiply the language model score in the end SMT VIII A. Maletti 24
84 Cube Pruning Algorithm Proceed as for n-best but multiply the language model score in the end SMT VIII A. Maletti 24
85 Cube Pruning Algorithm Proceed as for n-best but multiply the language model score in the end SMT VIII A. Maletti 24
86 Cube Pruning Algorithm Proceed as for n-best but multiply the language model score in the end SMT VIII A. Maletti 24
87 Cube Pruning Algorithm Proceed as for n-best but multiply the language model score in the end SMT VIII A. Maletti 24
88 Cube Pruning Algorithm Proceed as for n-best but multiply the language model score in the end SMT VIII A. Maletti 24
89 Cube Pruning Observations retains the efficiency of n-best derivation output list not sorted no guarantees SMT VIII A. Maletti 25
90 Contents 1 Inside and Outside Weights 2 n-best Derivations 3 Cube Pruning 4 Factorization SMT VIII A. Maletti 26
91 Factorization Motivation rk(m) essential part in the complexity of xtt operations [like input restriction] O( R len(m) P 2 rk(m)+5 ) Factorization reduce rk(m) by decomposing rules maximal decomposition prefered SMT VIII A. Maletti 27
92 Factorization References ZHANG, HUANG, GILDEA, KNIGHT: Synchronous binarization for machine translation. HLT-NAACL 2006 [for SCFG] GILDEA, SATTA, ZHANG: Factoring synchronous grammars by sorting. CoLing/ACL 2006 [for SCFG] NESSON, SATTA, SHIEBER: Optimal k-arization of synchronous tree-adjoining grammar. ACL 2008 [for STAG] GILDEA: Optimal parsing strategies for linear context-free rewriting systems. HLT-NAACL 2010 [for LCFRS] SMT VIII A. Maletti 28
93 Maximal Factorization Common advertisement slogan Optimal k-arization Optimal parsing strategy But factorization is just one way to reduce rk(m) obtained optimal rank is not optimal for the given transformation rk(m) is just one parameter to the parsing complexity Conclusion optimal k-arization maximal factorization optimal parsing strategy maximal factorization SMT VIII A. Maletti 29
94 Maximal Factorization Common advertisement slogan Optimal k-arization Optimal parsing strategy But factorization is just one way to reduce rk(m) obtained optimal rank is not optimal for the given transformation rk(m) is just one parameter to the parsing complexity Conclusion optimal k-arization maximal factorization optimal parsing strategy maximal factorization SMT VIII A. Maletti 29
95 Construction Example γ σ σ q 1 σ q 3 q 2 q q 1 γ σ q 2 q 3 Constructed rules SMT VIII A. Maletti 30
96 Construction Example γ σ σ q 1 σ q 3 q 2 q q 1 γ σ q 2 q 3 Constructed rules SMT VIII A. Maletti 30
97 Construction Example γ σ σ q 1 σ q 3 q 2 q q 1 γ σ q 2 q 3 Constructed rules σ q 1 q q 1 γ σ σ q 3 q 2 γ σ q 2 q 3 SMT VIII A. Maletti 30
98 Construction Notes runs in linear time returns maximal (meaningful) factorization returned XTOP is rank-optimal (among all factorizations) SMT VIII A. Maletti 31
99 Construction Notes runs in linear time returns maximal (meaningful) factorization returned XTOP is rank-optimal (among all factorizations) More Notes Factorization is strongly related to rule extraction Rule extraction always yields rank-optimal XTOP SMT VIII A. Maletti 31
100 References HUANG, CHIANG: Better k-best parsing. IWPT 2005 CHIANG: Hierarchical Phrase-Based Translation. Computat. Linguist. 33, 2007 MAY, KNIGHT: TIBURON a weighted tree automata toolkit. CIAA 2006 SMT VIII A. Maletti 32
101 References HUANG, CHIANG: Better k-best parsing. IWPT 2005 CHIANG: Hierarchical Phrase-Based Translation. Computat. Linguist. 33, 2007 MAY, KNIGHT: TIBURON a weighted tree automata toolkit. CIAA 2006 Thank you for your attention! SMT VIII A. Maletti 32
102 Questions? SMT VIII A. Maletti 33
Applications of Tree Automata Theory Lecture VI: Back to Machine Translation
Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing
More informationLecture 7: Introduction to syntax-based MT
Lecture 7: Introduction to syntax-based MT Andreas Maletti Statistical Machine Translation Stuttgart December 16, 2011 SMT VII A. Maletti 1 Lecture 7 Goals Overview Tree substitution grammars (tree automata)
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationHow to train your multi bottom-up tree transducer
How to train your multi bottom-up tree transducer Andreas Maletti Universität tuttgart Institute for Natural Language Processing tuttgart, Germany andreas.maletti@ims.uni-stuttgart.de Portland, OR June
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationStatistical Machine Translation of Natural Languages
1/26 Statistical Machine Translation of Natural Languages Heiko Vogler Technische Universität Dresden Germany Graduiertenkolleg Quantitative Logics and Automata Dresden, November, 2012 1/26 Weighted Tree
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationStructure and Complexity of Grammar-Based Machine Translation
Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationWhy Synchronous Tree Substitution Grammars?
Why ynchronous Tr ustitution Grammars? Andras Maltti Univrsitat Rovira i Virgili Tarragona, pain andras.maltti@urv.cat Los Angls Jun 4, 2010 Why ynchronous Tr ustitution Grammars? A. Maltti 1 Motivation
More informationApplications of Tree Automata Theory Lecture V: Theory of Tree Transducers
Applications of Tree Automata Theory Lecture V: Theory of Tree Transducers Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationRandom Generation of Nondeterministic Tree Automata
Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language
More informationOptimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems
Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems Pierluigi Crescenzi Dip. di Sistemi e Informatica Università di Firenze Daniel Gildea Computer Science Dept. University
More informationHierarchies of Tree Series TransducersRevisited 1
Hierarchies of Tree Series TransducersRevisited 1 Andreas Maletti 2 Technische Universität Dresden Fakultät Informatik June 27, 2006 1 Financially supported by the Friends and Benefactors of TU Dresden
More informationOptimal k-arization of synchronous tree-adjoining grammar
Optimal k-arization of synchronous tree-adjoining grammar The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters itation Rebecca Nesson,
More informationSyntax-based Statistical Machine Translation
Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I Part II - Introduction - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical
More informationPrefix Probability for Probabilistic Synchronous Context-Free Grammars
Prefix Probability for Probabilistic Synchronous Context-Free rammars Mark-Jan Nederhof School of Computer Science University of St Andrews North Haugh, St Andrews, Fife KY16 9SX United Kingdom markjan.nederhof@googlemail.com
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationMulti Bottom-up Tree Transducers
Multi Bottom-up Tree Transducers Andreas Maletti Institute for Natural Language Processing Universität tuttgart, Germany maletti@ims.uni-stuttgart.de Avignon April 24, 2012 Multi Bottom-up Tree Transducers
More informationPart I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions
Syntax-based Statistical Machine Translation Philip Williams and Philipp Koehn 29 October 2014 Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions Syntax-based Statistical
More informationTop-Down K-Best A Parsing
Top-Down K-Best A Parsing Adam Pauls and Dan Klein Computer Science Division University of California at Berkeley {adpauls,klein}@cs.berkeley.edu Chris Quirk Microsoft Research Redmond, WA, 98052 chrisq@microsoft.com
More informationCompositions of Tree Series Transformations
Compositions of Tree Series Transformations Andreas Maletti a Technische Universität Dresden Fakultät Informatik D 01062 Dresden, Germany maletti@tcs.inf.tu-dresden.de December 03, 2004 1. Motivation 2.
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationQueens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.
Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane c Sateesh R. Mane 2018 3 Lecture 3 3.1 General remarks March 4, 2018 This
More informationProbabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing
Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty
More informationDecoding Revisited: Easy-Part-First & MERT. February 26, 2015
Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische
More informationBringing machine learning & compositional semantics together: central concepts
Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding
More informationEfficient Incremental Decoding for Tree-to-String Translation
Efficient Incremental Decoding for Tree-to-String Translation Liang Huang 1 1 Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292, USA
More informationUNIT - 2 Unit-02/Lecture-01
UNIT - 2 Unit-02/Lecture-01 Solution of algebraic & transcendental equations by regula falsi method Unit-02/Lecture-01 [RGPV DEC(2013)] [7] Unit-02/Lecture-01 [RGPV JUNE(2014)] [7] Unit-02/Lecture-01 S.NO
More informationLagrangian Relaxation Algorithms for Inference in Natural Language Processing
Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David
More informationUnit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing
CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...
More informationStatistical Machine Translation of Natural Languages
1/37 Statistical Machine Translation of Natural Languages Rule Extraction and Training Probabilities Matthias Büchse, Toni Dietze, Johannes Osterholzer, Torsten Stüber, Heiko Vogler Technische Universität
More informationDual Decomposition for Natural Language Processing. Decoding complexity
Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model
More informationPreservation of Recognizability for Weighted Linear Extended Top-Down Tree Transducers
Preservation of Recognizability for Weighted Linear Extended Top-Down Tree Transducers Nina eemann and Daniel Quernheim and Fabienne Braune and Andreas Maletti University of tuttgart, Institute for Natural
More informationTree Transducers in Machine Translation
Tree Transducers in Machine Translation Andreas Maletti Universitat Rovira i Virgili Tarragona, pain andreas.maletti@urv.cat Jena August 23, 2010 Tree Transducers in MT A. Maletti 1 Machine translation
More informationThe Power of Tree Series Transducers
The Power of Tree Series Transducers Andreas Maletti 1 Technische Universität Dresden Fakultät Informatik June 15, 2006 1 Research funded by German Research Foundation (DFG GK 334) Andreas Maletti (TU
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationCube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings
Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings Kevin Gimpel and Noah A. Smith Language Technologies Institute Carnegie Mellon University Pittsburgh,
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationCompositions of Bottom-Up Tree Series Transformations
Compositions of Bottom-Up Tree Series Transformations Andreas Maletti a Technische Universität Dresden Fakultät Informatik D 01062 Dresden, Germany maletti@tcs.inf.tu-dresden.de May 17, 2005 1. Motivation
More informationComplexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar
Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar Rebecca Nesson School of Engineering and Applied Sciences Harvard University Giorgio Satta Department of Information
More informationSRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY, COIMBATORE- 10 DEPARTMENT OF SCIENCE AND HUMANITIES B.E - EEE & CIVIL UNIT I
SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY, COIMBATORE- 10 DEPARTMENT OF SCIENCE AND HUMANITIES B.E - EEE & CIVIL SUBJECT: NUMERICAL METHODS ( SEMESTER VI ) UNIT I SOLUTIONS OF EQUATIONS AND EIGEN VALUE PROBLEMS
More informationCompositions of Top-down Tree Transducers with ε-rules
Compositions of Top-down Tree Transducers with ε-rules Andreas Maletti 1, and Heiko Vogler 2 1 Universitat Rovira i Virgili, Departament de Filologies Romàniques Avinguda de Catalunya 35, 43002 Tarragona,
More informationThe iteration formula for to find the root of the equation
SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY, COIMBATORE- 10 DEPARTMENT OF SCIENCE AND HUMANITIES SUBJECT: NUMERICAL METHODS & LINEAR PROGRAMMING UNIT II SOLUTIONS OF EQUATION 1. If is continuous in then under
More informationLecture Notes on Inductive Definitions
Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 28, 2003 These supplementary notes review the notion of an inductive definition and give
More informationNumerical mathematics with GeoGebra in high school
Herceg 2009/2/18 23:36 page 363 #1 6/2 (2008), 363 378 tmcs@inf.unideb.hu http://tmcs.math.klte.hu Numerical mathematics with GeoGebra in high school Dragoslav Herceg and Ðorđe Herceg Abstract. We have
More informationNUMERICAL AND STATISTICAL COMPUTING (MCA-202-CR)
NUMERICAL AND STATISTICAL COMPUTING (MCA-202-CR) Autumn Session UNIT 1 Numerical analysis is the study of algorithms that uses, creates and implements algorithms for obtaining numerical solutions to problems
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationEfficient Parsing of Well-Nested Linear Context-Free Rewriting Systems
Efficient Parsing of Well-Nested Linear Context-Free Rewriting Systems Carlos Gómez-Rodríguez 1, Marco Kuhlmann 2, and Giorgio Satta 3 1 Departamento de Computación, Universidade da Coruña, Spain, cgomezr@udc.es
More informationChapter 4. Solution of Non-linear Equation. Module No. 1. Newton s Method to Solve Transcendental Equation
Numerical Analysis by Dr. Anita Pal Assistant Professor Department of Mathematics National Institute of Technology Durgapur Durgapur-713209 email: anita.buie@gmail.com 1 . Chapter 4 Solution of Non-linear
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationNumerical Methods Dr. Sanjeev Kumar Department of Mathematics Indian Institute of Technology Roorkee Lecture No 7 Regula Falsi and Secant Methods
Numerical Methods Dr. Sanjeev Kumar Department of Mathematics Indian Institute of Technology Roorkee Lecture No 7 Regula Falsi and Secant Methods So welcome to the next lecture of the 2 nd unit of this
More informationTo make a grammar probabilistic, we need to assign a probability to each context-free rewrite
Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationSynchronous Grammars
ynchronous Grammars ynchronous grammars are a way of simultaneously generating pairs of recursively related strings ynchronous grammar w wʹ ynchronous grammars were originally invented for programming
More informationTHE LANGUAGE OF FUNCTIONS *
PRECALCULUS THE LANGUAGE OF FUNCTIONS * In algebra you began a process of learning a basic vocabulary, grammar, and syntax forming the foundation of the language of mathematics. For example, consider the
More informationLecture Notes on Inductive Definitions
Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and
More informationNON-LINEAR ALGEBRAIC EQUATIONS Lec. 5.1: Nonlinear Equation in Single Variable
NON-LINEAR ALGEBRAIC EQUATIONS Lec. 5.1: Nonlinear Equation in Single Variable Dr. Niket Kaisare Department of Chemical Engineering IIT Madras NPTEL Course: MATLAB Programming for Numerical Computations
More informationSOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS BISECTION METHOD
BISECTION METHOD If a function f(x) is continuous between a and b, and f(a) and f(b) are of opposite signs, then there exists at least one root between a and b. It is shown graphically as, Let f a be negative
More informationOut of GIZA Efficient Word Alignment Models for SMT
Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza
More informationRoots of Equations. ITCS 4133/5133: Introduction to Numerical Methods 1 Roots of Equations
Roots of Equations Direct Search, Bisection Methods Regula Falsi, Secant Methods Newton-Raphson Method Zeros of Polynomials (Horner s, Muller s methods) EigenValue Analysis ITCS 4133/5133: Introduction
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based tatistical Machine Translation Marcello Federico (based on slides by Gabriele Musillo) Human Language Technology FBK-irst 2011 Outline Tree-based translation models: ynchronous context
More informationFeature-Rich Translation by Quasi-Synchronous Lattice Parsing
Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213,
More informationHence a root lies between 1 and 2. Since f a is negative and f(x 0 ) is positive The root lies between a and x 0 i.e. 1 and 1.
The Bisection method or BOLZANO s method or Interval halving method: Find the positive root of x 3 x = 1 correct to four decimal places by bisection method Let f x = x 3 x 1 Here f 0 = 1 = ve, f 1 = ve,
More informationImproving Sequence-to-Sequence Constituency Parsing
Improving Sequence-to-Sequence Constituency Parsing Lemao Liu, Muhua Zhu and Shuming Shi Tencent AI Lab, Shenzhen, China {redmondliu,muhuazhu, shumingshi}@tencent.com Abstract Sequence-to-sequence constituency
More informationNumerical Methods. Root Finding
Numerical Methods Solving Non Linear 1-Dimensional Equations Root Finding Given a real valued function f of one variable (say ), the idea is to find an such that: f() 0 1 Root Finding Eamples Find real
More informationComposition Closure of ε-free Linear Extended Top-down Tree Transducers
Composition Closure of ε-free Linear Extended Top-down Tree Transducers Zoltán Fülöp 1, and Andreas Maletti 2, 1 Department of Foundations of Computer Science, University of Szeged Árpád tér 2, H-6720
More informationComparative Analysis of Convergence of Various Numerical Methods
Journal of Computer and Mathematical Sciences, Vol.6(6),290-297, June 2015 (An International Research Journal), www.compmath-journal.org ISSN 0976-5727 (Print) ISSN 2319-8133 (Online) Comparative Analysis
More informationp 1 p 0 (p 1, f(p 1 )) (p 0, f(p 0 )) The geometric construction of p 2 for the se- cant method.
80 CHAP. 2 SOLUTION OF NONLINEAR EQUATIONS f (x) = 0 y y = f(x) (p, 0) p 2 p 1 p 0 x (p 1, f(p 1 )) (p 0, f(p 0 )) The geometric construction of p 2 for the se- Figure 2.16 cant method. Secant Method The
More informationCross-Entropy and Estimation of Probabilistic Context-Free Grammars
Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Anna Corazza Department of Physics University Federico II via Cinthia I-8026 Napoli, Italy corazza@na.infn.it Giorgio Satta Department
More informationThe Power of Weighted Regularity-Preserving Multi Bottom-up Tree Transducers
International Journal of Foundations of Computer cience c World cientific Publishing Company The Power of Weighted Regularity-Preserving Multi Bottom-up Tree Transducers ANDREA MALETTI Universität Leipzig,
More informationComputing Lattice BLEU Oracle Scores for Machine Translation
Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon {firstname.lastname}@limsi.fr LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed
More informationContents. Preface to the Third Edition (2007) Preface to the Second Edition (1992) Preface to the First Edition (1985) License and Legal Information
Contents Preface to the Third Edition (2007) Preface to the Second Edition (1992) Preface to the First Edition (1985) License and Legal Information xi xiv xvii xix 1 Preliminaries 1 1.0 Introduction.............................
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationConsensus Training for Consensus Decoding in Machine Translation
Consensus Training for Consensus Decoding in Machine Translation Adam Pauls, John DeNero and Dan Klein Computer Science Division University of California at Berkeley {adpauls,denero,klein}@cs.berkeley.edu
More informationOnline Learning in Tensor Space
Online Learning in Tensor Space Yuan Cao Sanjeev Khudanpur Center for Language & Speech Processing and Human Language Technology Center of Excellence The Johns Hopkins University Baltimore, MD, USA, 21218
More informationParsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication
Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication Shay B. Cohen University of Edinburgh Daniel Gildea University of Rochester We describe a recognition algorithm for a subset
More informationCHAPTER-II ROOTS OF EQUATIONS
CHAPTER-II ROOTS OF EQUATIONS 2.1 Introduction The roots or zeros of equations can be simply defined as the values of x that makes f(x) =0. There are many ways to solve for roots of equations. For some
More informationTranslation as Weighted Deduction
Translation as Weighted Deduction Adam Lopez University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB United Kingdom alopez@inf.ed.ac.uk Abstract We present a unified view of many translation algorithms
More informationFinite-State Transducers
Finite-State Transducers - Seminar on Natural Language Processing - Michael Pradel July 6, 2007 Finite-state transducers play an important role in natural language processing. They provide a model for
More informationExtended Multi Bottom-Up Tree Transducers
Extended Multi Bottom-Up Tree Transducers Joost Engelfriet 1, Eric Lilin 2, and Andreas Maletti 3;? 1 Leiden Instite of Advanced Comper Science Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationSolving nonlinear equations (See online notes and lecture notes for full details) 1.3: Newton s Method
Solving nonlinear equations (See online notes and lecture notes for full details) 1.3: Newton s Method MA385 Numerical Analysis September 2018 (1/16) Sir Isaac Newton, 1643-1727, England. Easily one of
More informationMath 4329: Numerical Analysis Chapter 03: Fixed Point Iteration and Ill behaving problems. Natasha S. Sharma, PhD
Why another root finding technique? iteration gives us the freedom to design our own root finding algorithm. The design of such algorithms is motivated by the need to improve the speed and accuracy of
More informationTranslation as Weighted Deduction
Translation as Weighted Deduction Adam Lopez University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB United Kingdom alopez@inf.ed.ac.uk Abstract We present a unified view of many translation algorithms
More informationNatural Language Processing (CSEP 517): Machine Translation
Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,
More informationMargin-based Decomposed Amortized Inference
Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given
More informationSyntax-Based Decoding
Syntax-Based Decoding Philipp Koehn 9 November 2017 1 syntax-based models Synchronous Context Free Grammar Rules 2 Nonterminal rules NP DET 1 2 JJ 3 DET 1 JJ 3 2 Terminal rules N maison house NP la maison
More informationZeroes of Transcendental and Polynomial Equations. Bisection method, Regula-falsi method and Newton-Raphson method
Zeroes of Transcendental and Polynomial Equations Bisection method, Regula-falsi method and Newton-Raphson method PRELIMINARIES Solution of equation f (x) = 0 A number (real or complex) is a root of the
More informationBayesian Learning of Non-compositional Phrases with Synchronous Parsing
Bayesian Learning of Non-compositional Phrases with Synchronous Parsing Hao Zhang Computer Science Department University of Rochester Rochester, NY 14627 zhanghao@cs.rochester.edu Chris Quirk Microsoft
More informationCMSC 330: Organization of Programming Languages. Pushdown Automata Parsing
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing Chomsky Hierarchy Categorization of various languages and grammars Each is strictly more restrictive than the previous First described
More informationAutomata Theory. CS F-10 Non-Context-Free Langauges Closure Properties of Context-Free Languages. David Galles
Automata Theory CS411-2015F-10 Non-Context-Free Langauges Closure Properties of Context-Free Languages David Galles Department of Computer Science University of San Francisco 10-0: Fun with CFGs Create
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More information2. Tree-to-String [6] (Phrase Based Machine Translation, PBMT)[1] [2] [7] (Targeted Self-Training) [7] Tree-to-String [3] Tree-to-String [4]
1,a) 1,b) Graham Nubig 1,c) 1,d) 1,) 1. (Phras Basd Machin Translation, PBMT)[1] [2] Tr-to-String [3] Tr-to-String [4] [5] 1 Graduat School of Information Scinc, Nara Institut of Scinc and Tchnology a)
More informationLog-Linear Models, MEMMs, and CRFs
Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx
More information