Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms
|
|
- Hannah Harmon
- 5 years ago
- Views:
Transcription
1 % 70% 70% NTCIR-7 13% 90% 1,000 Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms Itsuki Toyota 1 Yusuke Takahashi 1 Kensaku Makita 1 Takehito Utsuro 2 Mikio Yamamoto 2 Abstract: A bilingual lexicon for technical terms is necessary in the process of translating patent documents. This paper studies a method of generating bilingual lexicon for technical terms from parallel patent documents. In the previous methods of generating bilingual lexicon from parallel patent documents, the portion from which parallel patent sentences are extracted is composed of the parts of Background and Embodiment. However, this portion is about 30% out of the whole Background and Embodiment parts and about 70% are not used. Considering this situation, this paper proposes to generate bilingual lexicon for technical terms from the remaining 70% out of the whole Background and Embodiment parts. The proposed method employs the compositional translation estimation technique which uses an existing bilingual lexicon. We show that, for 13% of the Japanese compound nouns that are not included in the phrase translation table trained with parallel patent sentences nor in the existing bilingual lexicon, translation candidates can be generated through the compositional translation estimation technique, and can be found in the English part of the patent family. On the average, we generate about two pairs of bilingual technical terms per patent family and we achieve over 90% accuracy. Keywords: bilingual lexicon for technical terms, patent family, statistical machine translation cfl 2012 Information Processing Society of Japan 1
2 1. [10] NTCIR-7 [3] 180 [7] Support Vector Machines (SVMs) [15] [8] [10] 180 [14] 30% 70% NTCIR-7 [12] 13% 90% 1, NTCIR-7 [3] 180 ( 1 ) (2) 1 Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, , Japan 2 Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, , Japan (3) (4) [14] 3. Moses [6] Moses. (1) (2) (3) (4) ( ). (5) (4) V 1,V 2,...,V n W 1,W 2,...,W m P J (= V p V p ) P E (= W q W q ) P J,P E T J,T E V i,w j p i p q j q P J P E T J,T E cfl 2012 Information Processing Society of Japan 2
3 1 1 1,631,099 1,847,945 2,244,117 47,554 41, ,420 24,696 23,025 82,087 1 parallel mode * 1 *2 [12] 2 ( JUMAN *3 ) P 2 P 2 B P B S Ver.131 Ver.79 Ver.131 *1 *2 Ver.79 Ver.131 * y S y S y T y S s i y S = s 1,s 2,,s n s i t i y T y T = t 1,t 2,,t n y S,y T y S,y T = s 1,t 1, s 2,t 2,, s n,t n y T y S y T ( ) y T s i,t i q( s i,t i ) y T cfl 2012 Information Processing Society of Japan 3
4 2 (PSD ) (NPSD ) y T ( Q corpus (y T ) ) y T 2 n q( s i,t i ) Q corpus (y T ) i=1 2 Q(y S,y T ) Q(y S,y T )= y S=s 1,s 2,...,s n i=1 n q( s i,t i ) Q corpus (y T ) s, t q( s, t ) s, t 10 (compo(s) 1) q( s, t ) = log 10 f p ( s, t ) B P log 10 f s ( s, t ) B S compo(s) s f p ( s, t ) P 2 s, t f p ( s, t ) P 2 s, t Q corpus (y T ) y T { 1 Q corpus (y T )= 0 5. NJ 1 B J NJ 2 M J NJ 3 B J M J PSD J NPSD J 2 D E PSD E NPSD E *4 D J = N 1 J,B J,N 2 J,M J,N 3 J B J M J = PSD J,NPSD J D E = PSD E,NPSD E B J M J NPSD J t J *4 cfl 2012 Information Processing Society of Japan 4
5 2 1,000 ( (%)) 345 (1.5) 2,914 (13.0) 13,165 (58.8) 5,972 (26.7) 22,396 (100) ( (%)) 5 (1.0) 71 (14.2) 423 (84.6) 1 (0.2) 500 (100) t J D E NPSD E TranCand(t J,NPSD E ) ) TranCand (t J,NP SD E { = t E NPSD E tj } t E Q(t J,t E ) > 0 TranCand(t J,NPSD E ) CompoTrans max CompoTrans max (t J,NP SD E ) = arg max Q(t J,t E ) t E TranCand(t J,NP SD E) D E NPSD E t E 6. 1,000 1,000 NTCIR-7 2 2, ,165 5,972 19, (84.8%) (99.8%) DB DB (79%) (65.8%) 52 cfl 2012 Information Processing Society of Japan 5
6 () ( (%)) 6 (11.6) 1 (1.9) and 1 (1.9) 1 (1.9) 43 (82.7) (11.6%) 7. ( ) ( ) IBM [1,2] [6] [9] [10] NTCIR-7 [3] ( 180 ) Support Vector Machines (SVMs) [15] [8] [9] ( ) [5] Google *5 [4] *5 cfl 2012 Information Processing Society of Japan 6
7 [11],. [13] 8. 70% NTCIR-7 [3] 13% 90% 1,000 [8] 17 pp (2011). [9] Matsumoto, Y. and Utsuro, T.: Lexical Knowledge Acquisition, Handbook of Natural Language Processing (Dale, R., Moisl, H. and Somers, H., eds.), Marcel Dekker Inc., chapter 24, pp (2000). [10] Vol. J93 D, No. 11, pp (2010). [11] 13 pp (2007). [12] 11 pp (2005). [13] Vol. 14, No. 2, pp (2007). [14] Utiyama, M. and Isahara, H.: A Japanese-English Patent Parallel Corpus, Proc. MT Summit XI, pp (2007). [15] Vapnik, V. N.: Statistical Learning Theory, Wiley- Interscience (1998). [1] Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., Mercer, R. L. and Roosin, P. S.: A Statistical Approach to Machine Translation, Computational Linguistics, Vol. 16, No. 2, pp (1990). [2] Brown, P. F., Della Pietra, S. A., Della Pietra, V. J. and Mercer, R. L.: The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, Vol. 19, No. 2, pp (1993). [3] Fujii, A., Utiyama, M., Yamamoto, M. and Utsuro, T.: Overview of the Patent Translation Task at the NTCIR- 7 Workshop, Proc. 7th NTCIR Workshop Meeting, pp (2008). [4] World Wide Web Vol. 47, No. 3, pp (2006). [5] Huang, F., Zhang, Y. and Vogel, S.: Mining Key Phrase Translations from Web Corpora, Proc. HLT/EMNLP, pp (2005). [6] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A. and Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation, Proc. 45th ACL, Companion Volume, pp (2007). [7] Koehn, P., Och, F. J. and Marcu, D.: Statistical Phrase- Based Translation, Proc. HLT-NAACL, pp (2003). cfl 2012 Information Processing Society of Japan 7
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science
More informationComputing Optimal Alignments for the IBM-3 Translation Model
Computing Optimal Alignments for the IBM-3 Translation Model Thomas Schoenemann Centre for Mathematical Sciences Lund University, Sweden Abstract Prior work on training the IBM-3 translation model is based
More informationImproving Relative-Entropy Pruning using Statistical Significance
Improving Relative-Entropy Pruning using Statistical Significance Wang Ling 1,2 N adi Tomeh 3 Guang X iang 1 Alan Black 1 Isabel Trancoso 2 (1)Language Technologies Institute, Carnegie Mellon University,
More informationShift-Reduce Word Reordering for Machine Translation
Shift-Reduce Word Reordering for Machine Translation Katsuhiko Hayashi, Katsuhito Sudoh, Hajime Tsukada, Jun Suzuki, Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai,
More informationShift-Reduce Word Reordering for Machine Translation
Shift-Reduce Word Reordering for Machine Translation Katsuhiko Hayashi, Katsuhito Sudoh, Hajime Tsukada, Jun Suzuki, Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai,
More informationPhrase Table Pruning via Submodular Function Maximization
Phrase Table Pruning via Submodular Function Maximization Masaaki Nishino and Jun Suzuki and Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun,
More informationNatural Language Processing (CSEP 517): Machine Translation
Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,
More informationEnhanced Bilingual Evaluation Understudy
Enhanced Bilingual Evaluation Understudy Krzysztof Wołk, Krzysztof Marasek Department of Multimedia Polish Japanese Institute of Information Technology, Warsaw, POLAND kwolk@pjwstk.edu.pl Abstract - Our
More informationToponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge
Toponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge Raivis Skadiņš Tilde SIA Vienibas gatve 75a, Riga, Latvia raiviss@tilde.lv Tatiana Gornostay Tilde SIA Vienibas gatve 75a,
More informationarxiv: v1 [cs.cl] 22 Jun 2015
Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning arxiv:1506.06442v1 [cs.cl] 22 Jun 2015 Fandong Meng 1 Zhengdong Lu 2 Zhaopeng Tu 2 Hang Li 2 and Qun Liu 1 1 Institute
More informationWord Alignment via Quadratic Assignment
Word Alignment via Quadratic Assignment Simon Lacoste-Julien UC Berkeley, Berkeley, CA 94720 slacoste@cs.berkeley.edu Ben Taskar UC Berkeley, Berkeley, CA 94720 taskar@cs.berkeley.edu Dan Klein UC Berkeley,
More informationNeural Hidden Markov Model for Machine Translation
Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and
More informationLattice-based System Combination for Statistical Machine Translation
Lattice-based System Combination for Statistical Machine Translation Yang Feng, Yang Liu, Haitao Mi, Qun Liu, Yajuan Lu Key Laboratory of Intelligent Information Processing Institute of Computing Technology
More informationSampling Alignment Structure under a Bayesian Translation Model
Sampling Alignment Structure under a Bayesian Translation Model John DeNero, Alexandre Bouchard-Côté and Dan Klein Computer Science Department University of California, Berkeley {denero, bouchard, klein}@cs.berkeley.edu
More informationInvestigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation
Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation Spyros Martzoukos Christophe Costa Florêncio and Christof Monz Intelligent Systems Lab
More informationCoactive Learning for Interactive Machine Translation
Coactive Learning for Interactive Machine ranslation Artem Sokolov Stefan Riezler Shay B. Cohen Computational Linguistics Heidelberg University 69120 Heidelberg, Germany sokolov@cl.uni-heidelberg.de Computational
More informationWord Alignment via Quadratic Assignment
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 2006 Word Alignment via Quadratic Assignment Simon Lacoste-Julien University of California
More informationMulti-Task Word Alignment Triangulation for Low-Resource Languages
Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu
More informationWord Alignment by Thresholded Two-Dimensional Normalization
Word Alignment by Thresholded Two-Dimensional Normalization Hamidreza Kobdani, Alexander Fraser, Hinrich Schütze Institute for Natural Language Processing University of Stuttgart Germany {kobdani,fraser}@ims.uni-stuttgart.de
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source
More informationFast Collocation-Based Bayesian HMM Word Alignment
Fast Collocation-Based Bayesian HMM Word Alignment Philip Schulz ILLC University of Amsterdam P.Schulz@uva.nl Wilker Aziz ILLC University of Amsterdam W.Aziz@uva.nl Abstract We present a new Bayesian HMM
More informationExact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Abstract This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation.
More informationA Systematic Comparison of Training Criteria for Statistical Machine Translation
A Systematic Comparison of Training Criteria for Statistical Machine Translation Richard Zens and Saša Hasan and Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationNatural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale
Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Noah Smith c 2017 University of Washington nasmith@cs.washington.edu May 22, 2017 1 / 30 To-Do List Online
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationA phrase-based hidden Markov model approach to machine translation
A phrase-based hidden Markov model approach to machine translation Jesús Andrés-Ferrer Universidad Politécnica de Valencia Dept. Sist. Informáticos y Computación jandres@dsic.upv.es Alfons Juan-Císcar
More informationMinimum Bayes-risk System Combination
Minimum Bayes-risk System Combination Jesús González-Rubio Instituto Tecnológico de Informática U. Politècnica de València 46022 Valencia, Spain jegonzalez@iti.upv.es Alfons Juan Francisco Casacuberta
More informationPhrasetable Smoothing for Statistical Machine Translation
Phrasetable Smoothing for Statistical Machine Translation George Foster and Roland Kuhn and Howard Johnson National Research Council Canada Ottawa, Ontario, Canada firstname.lastname@nrc.gc.ca Abstract
More informationExact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Yin-Wen Chang MIT CSAIL Cambridge, MA 02139, USA yinwen@csail.mit.edu Michael Collins Department of Computer Science, Columbia
More informationTranslation as Weighted Deduction
Translation as Weighted Deduction Adam Lopez University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB United Kingdom alopez@inf.ed.ac.uk Abstract We present a unified view of many translation algorithms
More informationAutomatically Evaluating Text Coherence using Anaphora and Coreference Resolution
1 1 Barzilay 1) Automatically Evaluating Text Coherence using Anaphora and Coreference Resolution Ryu Iida 1 and Takenobu Tokunaga 1 We propose a metric for automatically evaluating discourse coherence
More informationConsensus Training for Consensus Decoding in Machine Translation
Consensus Training for Consensus Decoding in Machine Translation Adam Pauls, John DeNero and Dan Klein Computer Science Division University of California at Berkeley {adpauls,denero,klein}@cs.berkeley.edu
More informationProbabilistic Word Alignment under the L 0 -norm
15th Conference on Computational Natural Language Learning, Portland, Oregon, 2011 Probabilistic Word Alignment under the L 0 -norm Thomas Schoenemann Center for Mathematical Sciences Lund University,
More informationLanguage Information RetrievalÓ j
± i jschang@cs.nthu.edu.tw o ± k j ± j z ± wyw j ± j ÒCross Language Information RetrievalÓ j h ±ÒQuery TranslationÓ ± k h ± ± ± k ± h ± j ± ± ± 1. ± j y k j ± j z ± y p wywj ± 1 j ÒCross-Language Information
More informationGeneralized Stack Decoding Algorithms for Statistical Machine Translation
Generalized Stack Decoding Algorithms for Statistical Machine Translation Daniel Ortiz Martínez Inst. Tecnológico de Informática Univ. Politécnica de Valencia 4607 Valencia, Spain dortiz@iti.upv.es Ismael
More informationNatural Language Processing (CSE 517): Machine Translation
Natural Language Processing (CSE 517): Machine Translation Noah Smith c 2018 University of Washington nasmith@cs.washington.edu May 23, 2018 1 / 82 Evaluation Intuition: good translations are fluent in
More informationTranslation as Weighted Deduction
Translation as Weighted Deduction Adam Lopez University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB United Kingdom alopez@inf.ed.ac.uk Abstract We present a unified view of many translation algorithms
More informationIntegrating Morphology in Probabilistic Translation Models
Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti das alte Haus the old house mach das do that 2 das alte
More informationCLRG Biocreative V
CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre
More informationCKY-based Convolutional Attention for Neural Machine Translation
CKY-based Convolutional Attention for Neural Machine Translation Taiki Watanabe and Akihiro Tamura and Takashi Ninomiya Ehime University 3 Bunkyo-cho, Matsuyama, Ehime, JAPAN {t_watanabe@ai.cs, tamura@cs,
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationStructure and Complexity of Grammar-Based Machine Translation
Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationBreaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Yilin Yang 1 Liang Huang 1,2 Mingbo Ma 1,2 1 Oregon State University Corvallis, OR,
More informationMulti-Task Minimum Error Rate Training for SMT
Minimum Error Rate Training for SMT Patrick Katharina Stefan Department of Computational Linguistics University of Heielberg, Germany Learning Multi-task learning aims at learning several ifferent tasks
More informationApplications of Deep Learning
Applications of Deep Learning Alpha Go Google Translate Data Center Optimisation Robin Sigurdson, Yvonne Krumbeck, Henrik Arnelid November 23, 2016 Template by Philipp Arndt Applications of Deep Learning
More informationExploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation
Journal of Machine Learning Research 12 (2011) 1-30 Submitted 5/10; Revised 11/10; Published 1/11 Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation Yizhao
More informationMaximal Lattice Overlap in Example-Based Machine Translation
Maximal Lattice Overlap in Example-Based Machine Translation Rebecca Hutchinson Paul N. Bennett Jaime Carbonell Peter Jansen Ralf Brown June 6, 2003 CMU-CS-03-138 School of Computer Science Carnegie Mellon
More informationTime Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter
Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Daichi Koike Yusuke Takahashi Takehito Utsuro Grad. Sch. Sys. & Inf. Eng., University of Tsukuba, Tsukuba, 305-8573,
More informationAsymmetric Term Alignment with Selective Contiguity Constraints by Multi-Tape Automata
Asymmetric Term Alignment with Selective Contiguity Constraints by Multi-Tape Automata Mădălina Barbaiani,, Nicola Cancedda, Chris Dance, Szilárd Fazekas,, Tamás Gaál, and Éric Gaussier, Xerox Research
More informationLearning Features from Co-occurrences: A Theoretical Analysis
Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi
More informationUsing a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs
Using a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs Yasuhiro Akiba,, Eiichiro Sumita, Hiromi Nakaiwa, Seiichi Yamamoto, and Hiroshi G. Okuno ATR
More informationTree as a Pivot: Syntactic Matching Methods in Pivot Translation
Tree as a Pivot: Syntactic Matching Methods in Pivot Translation Akiva Miura, Graham Neubig,, Katsuhito Sudoh, Satoshi Nakamura Nara Institute of Science and Technology, Japan Carnegie Mellon University,
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationFeature-Rich Translation by Quasi-Synchronous Lattice Parsing
Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213,
More informationBayesian Learning of Non-compositional Phrases with Synchronous Parsing
Bayesian Learning of Non-compositional Phrases with Synchronous Parsing Hao Zhang Computer Science Department University of Rochester Rochester, NY 14627 zhanghao@cs.rochester.edu Chris Quirk Microsoft
More informationLexical Translation Models 1I
Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)
More informationNTT/NAIST s Text Summarization Systems for TSC-2
Proceedings of the Third NTCIR Workshop NTT/NAIST s Text Summarization Systems for TSC-2 Tsutomu Hirao Kazuhiro Takeuchi Hideki Isozaki Yutaka Sasaki Eisaku Maeda NTT Communication Science Laboratories,
More informationAn Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz
More informationEfficient Incremental Decoding for Tree-to-String Translation
Efficient Incremental Decoding for Tree-to-String Translation Liang Huang 1 1 Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292, USA
More information2. Tree-to-String [6] (Phrase Based Machine Translation, PBMT)[1] [2] [7] (Targeted Self-Training) [7] Tree-to-String [3] Tree-to-String [4]
1,a) 1,b) Graham Nubig 1,c) 1,d) 1,) 1. (Phras Basd Machin Translation, PBMT)[1] [2] Tr-to-String [3] Tr-to-String [4] [5] 1 Graduat School of Information Scinc, Nara Institut of Scinc and Tchnology a)
More informationA Recursive Statistical Translation Model
A Recursive Statistical Translation Model Juan Miguel Vilar Dpto. de Lenguajes y Sistemas Informáticos Universitat Jaume I Castellón (Spain) jvilar@lsi.uji.es Enrique Vidal Dpto. de Sistemas Informáticos
More informationAcquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe
1 1 16 Web 96% 79.1% 2 Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe Tomohide Shibata 1 and Sadao Kurohashi 1 This paper proposes a method for automatically
More informationstatistical machine translation
statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical
More informationMonte Carlo techniques for phrase-based translation
Monte Carlo techniques for phrase-based translation Abhishek Arun (a.arun@sms.ed.ac.uk) Barry Haddow (bhaddow@inf.ed.ac.uk) Philipp Koehn (pkoehn@inf.ed.ac.uk) Adam Lopez (alopez@inf.ed.ac.uk) University
More informationlogo The Prague Bulletin of Mathematical Linguistics NUMBER??? AUGUST Efficient Word Alignment with Markov Chain Monte Carlo
P B M L logo The Prague Bulletin of Mathematical Linguistics NUMBER??? AUGUST 2016 1 22 Efficient Word Alignment with Markov Chain Monte Carlo Robert Östling, Jörg Tiedemann Department of Modern Languages,
More informationPAPER Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation
1536 IEICE TRANS. INF. & SYST., VOL.E96 D, NO.7 JULY 2013 PAPER Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation Zezhong LI a, Member, Hideto IKEDA, Nonmember, and
More informationTnT Part of Speech Tagger
TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation
More informationOverview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments
Overview Learning phrases from alignments 6.864 (Fall 2007) Machine Translation Part III A phrase-based model Decoding in phrase-based models (Thanks to Philipp Koehn for giving me slides from his EACL
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward
More informationCoverage Embedding Models for Neural Machine Translation
Coverage Embedding Models for Neural Machine Translation Haitao Mi Baskaran Sankaran Zhiguo Wang Abe Ittycheriah T.J. Watson Research Center IBM 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {hmi, bsankara,
More informationRe-evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE
Re-evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE Yvette Graham ADAPT Centre School of Computer Science and Statistics Trinity College Dublin graham.yvette@gmail.com Abstract We provide
More informationMinimum Error Rate Training Semiring
Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)
More informationGappy Phrasal Alignment by Agreement
Gappy Phrasal Alignment by Agreement Mohit Bansal UC Berkeley, CS Division mbansal@cs.berkeley.edu Chris Quirk Microsoft Research chrisq@microsoft.com Robert C. Moore Google Research robert.carter.moore@gmail.com
More informationChunking with Support Vector Machines
NAACL2001 Chunking with Support Vector Machines Graduate School of Information Science, Nara Institute of Science and Technology, JAPAN Taku Kudo, Yuji Matsumoto {taku-ku,matsu}@is.aist-nara.ac.jp Chunking
More informationStatistical NLP Spring Corpus-Based MT
Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it
More informationCorpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1
Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationStatistical Substring Reduction in Linear Time
Statistical Substring Reduction in Linear Time Xueqiang Lü Institute of Computational Linguistics Peking University, Beijing lxq@pku.edu.cn Le Zhang Institute of Computer Software & Theory Northeastern
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi
More informationLanguage Modelling. Marcello Federico FBK-irst Trento, Italy. MT Marathon, Edinburgh, M. Federico SLM MT Marathon, Edinburgh, 2012
Language Modelling Marcello Federico FBK-irst Trento, Italy MT Marathon, Edinburgh, 2012 Outline 1 Role of LM in ASR and MT N-gram Language Models Evaluation of Language Models Smoothing Schemes Discounting
More informationBringing machine learning & compositional semantics together: central concepts
Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationA Comparison of Approaches for Geospatial Entity Extraction from Wikipedia
A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia Daryl Woodward, Jeremy Witmer, and Jugal Kalita University of Colorado, Colorado Springs Computer Science Department 1420 Austin
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationAnalyzing Burst of Topics in News Stream
1 1 1 2 2 Kleinberg LDA (latent Dirichlet allocation) DTM (dynamic topic model) DTM Analyzing Burst of Topics in News Stream Yusuke Takahashi, 1 Daisuke Yokomoto, 1 Takehito Utsuro 1 and Masaharu Yoshioka
More informationP R + RQ P Q: Transliteration Mining Using Bridge Language
P R + RQ P Q: Transliteration Mining Using Bridge Language Mitesh M. Khapra Raghavendra Udupa n Institute of Technology Microsoft Research, Bombay, Bangalore, Powai, Mumbai 400076 raghavu@microsoft.com
More informationA Fully-Lexicalized Probabilistic Model for Japanese Zero Anaphora Resolution
A Fully-Lexicalized Probabilistic Model for Japanese Zero Anaphora Resolution Ryohei Sasano Graduate School of Information Science and Technology, University of Tokyo ryohei@nlp.kuee.kyoto-u.ac.jp Daisuke
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationTE4-2 30th Fuzzy System Symposium(Kochi,September 1-3,2014) Advertising Slogan Selection System Using Review Comments on the Web. 2 Masafumi Hagiwara
Web Advertising Slogan Selection System Using Review Comments on the Web 1 1 Hiroaki Yamane, 2 2 Masafumi Hagiwara 1 1 Graduate School of Science and Technology, Keio University Abstract: Increased demand
More informationCOMS 4705, Fall Machine Translation Part III
COMS 4705, Fall 2011 Machine Translation Part III 1 Roadmap for the Next Few Lectures Lecture 1 (last time): IBM Models 1 and 2 Lecture 2 (today): phrase-based models Lecture 3: Syntax in statistical machine
More informationA Deterministic Word Dependency Analyzer Enhanced With Preference Learning
A Deterministic Word Dependency Analyzer Enhanced With Preference Learning Hideki Isozaki and Hideto Kazawa and Tsutomu Hirao NTT Communication Science Laboratories NTT Corporation 2-4 Hikaridai, Seikacho,
More informationOrthonormal Explicit Topic Analysis for Cross-lingual Document Matching
Orthonormal Explicit Topic Analysis for Cross-lingual Document Matching John Philip M c Crae University Bielefeld Inspiration 1 Bielefeld, Germany Roman Klinger University Bielefeld Inspiration 1 Bielefeld,
More information