Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories
|
|
- Mercy Henderson
- 5 years ago
- Views:
Transcription
1 Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories
2 2 Speech Translation Structure Single-best only ASR Single-best MT N-best hypothesis translation ASR N-best MT Word lattice X J E ASR WLT
3 3 References Ney (ICASSP 1999). Speech translation: Coupling of recognition and translation. Casacuberta (2002). Architectures for speech-to-speech translation using finite-state transducer Zhang(Coling 2004). A unified approach in speech-tospeech translation Saleem(ICSLP 2004). Using word lattice information for a tight coupling in speech translation systems Matusov(Eurospeech 2005). On the Integration of Speech Recognition and SMT (Bozarov, Zhang)(Eurospeech 2005). Speech Translation by Confidence Measure
4 4 Outline N-best translation Word lattice translation IWSLT 2005 evaluation Conclusions
5 5 N-best Hypothesis Translation J1 E1,1, E1,2,, E1,K X J2 JN E2,1,E2,2,,E2,K EN,1,EN,2,,EN,K ASR SMT Rescore E J1, J2 and JN : N-best speech recognition hypotheses E2,1, E2,2 E2,K : K-best translation hypotheses produced from J2 Rescore: to rescore all NxK translations
6 6 Rescore: Integration of ASR and SMT Statistical theory Make approximations Eˆ, Jˆ = arg max E, J { P( E) P( X J ) P( J E) }
7 7 Rescore: Log-linear models = = M m m m E E X P E 1 ), ( log max arg λ : m-th feature in log value : weight of each feature ), ( E X P m λ = = = E M m m m M m m m E X f E X f X E P 1 1 )), ( exp( )), ( exp( ) ( λ λ : all possible translation hypotheses E
8 8 Parameter optimization Objective function : λ = optimize D( R, E M 1 s s ) E s translation output after log-linear model rescoring R s References of English sentences. 16 reference sentences for each English sentence D( R s, Es ) Automatic translation quality metrics. BLEU, NIST, mwer and mper
9 9 Translation Assessment D( R s, Es ) N-gram methods BLEU: A weighted geometric mean of the n-gram matches between test and reference sentences plus a short sentence penalty NIST: An arithmetic mean of the n-gram matches between test and reference sentences Word error rate mwer: multiple reference word error rate. mper: multiple reference position independent word error rate
10 Optimization:Direction Set Methods Change initial lambda D( R s, Es ) Change Direction Local optimization Local lambda Best lambda 10 λ
11 Features from ASR Acoustic Model (AM) scores Gaussian mixture output probability density function(pdf) Language Model(LM) scores N-gram language model 11
12 12 Features from Phrase-based SMT Target language model ( trigram ) Target class language model: SRILM cluster (5- gram) Target phrase language model: Phrase translation model: Distortion model : Length model: NULL word translation model: Jump model: Long distance target LM: (9-gram) for rescore Long distance class LM: (11-gram)
13 13 An Experimental Results of N- best Translation BLEU # hypotheses
14 Word Lattice Translation 14
15 15 Recognition Word Lattice 経験者 /s 救急車 を 呼ぶ でもらえ ます か /s 消え 検査 で もらえ ASR First-best: 経験者を呼ぶでもらえますか First-best translation: Could I get a job ASR correct recognition: 救急車を呼ぶでもらえますか Word Lattice translation: Could you call an ambulance
16 16 Machine Translation for Text could you call an ambulance NULL model NULL could you call an ambulance Fertility model Lexical model Distortion model NULL NULL could could call ambulance かをでもらえます呼ぶ救急車 救急車を呼ぶでもらえますか
17 17 Machine Translation for Lattice could I you get call a an job ambulance NULL model NULL could I you get call a an job ambulance Fertility model NULL NULL could could get call job ambulance
18 18 Machine Translation for Lattice Lexical model 経験者 かをでもらえます呼ぶ 救急車 経験者 Distortion model を呼ぶでもらえますか 救急車
19 19 How We Translate Word Lattice Two-step decoding: beam-search + A* search beam search: construct translation word graph (TWG) An edge in the word lattice is mapped to an edge in the TWG A path in the TWG corresponds to a path in the word lattice Lower-scored edges are pruned. Simple translation models are used. A* search: Search the TWG with a higher-grade translation models(ibm model4)
20 Illustration of Constructing TWG (Translation Word Graph) Beam-search: threshold pruning 経験者救急車呼ぶでもらえます job ambulance call get can could you job ambulance call get can could you job ambulance call get can could you job ambulance call get can could you I I I I 20
21 Translation Word Graph (example) 21
22 22 A* Search A* search Forward score: Accumulated from the start node to current node, using IBM Model4 model Heuristic score: Accumulated from the current node to the end node Approximations are made on the models dependent on the length of source sentence: distribution model NULL word
23 23 Features in Speech Translation Models Eˆ = arg max{ λ log P + λ log P ( ) + λ E 0 pp 1 lm E log P lm ( POS( E)) + λ 3 log P( 0) 2 φ + λ 4 log N( Φ E) + λ 5 log Τ ( J E) + λ 6 log D( E, J )}
24 Effect of Word Lattice Translation BLEU 1st best Lattice #Nbest in Lattice 24
25 25 Beam-size effect in WLT 1 A translation of 1 st best ASR hypotheses 2 A translation of 2 nd best ASR hypotheses 3 A translation of 3 rd best ASR hypotheses Single-best translation Word lattice translation
26 26 Beam-size Effects in WLT (Nbest=20) Promising hypotheses pruned in WLT but saved in single-best translation under the same beam size BLEU best 20-best Beam size
27 Why Word Lattice Minimization Raw lattice is too huge A lot of duplicated word IDs in the lattice Significant are the top N-best hypotheses Minimization under the light of machine translation Minimization can make decoding fast Minimization can reduce translation error; reduce pruning error in decoding 27
28 Word Lattice Minimization 28
29 29 Word Lattice?? N-best?? After lattice minimization, the output is not a lattice again. Only N-best with new assigned edge ids. After lattice minimization, the ASR score lost in single edge. Instead, we use ASR path score to represent single edge s score.
30 Effect of Lattice Minimization st best La ttice w/o min La ttice w/ min #Nbest in lattice 30
31 31 Posterior Probability Integrating acoustic model and language model probabilities Indicating relative accuracy of N-best hypotheses p( J j X ) = N e i = 1 λ log score e λ log score j i log score i :log-scale ASR score ( AM+LM )
32 32 Confidence Measure Filtering ASR hypotheses with very low posterior probability degrade translations A predefined confidence threshold, T, is applied to remove the most unlikely ASR hypotheses By comparing a hypothesis s posterior probability to the single-best hypothesis s posterior probability multiplied by T, Pfirst-best*T, remove the smaller.
33 Confidence Measure Filtering ASR Output ASR score PP=Posterior probability cmf= PP/PP1-st Decision cmf>0.5? 1 st cand PASS 2 nd cand PASS 3 rd cand PASS 4 th cand PASS 5 th cand PASS 6 th cand FAIL 7 th cand FAIL 8 th cand FAIL SUM=
34 34 Effect of CM filtering st best N-b e st La ttic e Min. w/ CMF
35 IWSLT 2005 Evaluation 35
36 36 IWSLT 2005 Evaluation( training data) Language pair Data track Data size perplexity Testset Dev.data C/E J/E Supplied +tagger 20K C-star 172K Supplied +tagger 20K C-star 463K
37 37 Test Data Analysis Japanese Chinese N=1 N= N=1 N=20 ASR Recognition Accuracy
38 38 Test Data Results (J/E BLEU C-star track) 0.74 CSTAR track Text N-best Lattice 1-best
39 39 Test Data Results ( J/E NIST C-star track) CSTAR track Text N-best Lattice 1-best
40 40 Test Data Results(J/E WER C- star track) CSTAR track Text N-best Lattice 1-best
41 41 Evaluation Results (CE) Data track Input BLEU NIST WER PER METEOR GTM Supplied+to ols Text Nbest Sbest Cstar Text Nbest Sbest
42 Evaluation Results(JE) Data track Input BLEU NIST WER PER METEOR GTM Supplied+ tagger Text Nbest Lattice Sbest Cstar Text Nbest Lattice Sbest
43 Remarks Text translation (0.727) > N-best translation (0.679) N-best translation (0.679) > lattice translation (0.67) Lattice translation (0.670) > single-best translation (0.646) Training data size influences speech translation 43
44 Analysis: Lattice Translation Worse than N-best Translation We used the same number of ASR hypotheses in N- best translation and lattice translation In beam search, N-best translation and lattice translation used the same beam size and threshold in pruning Model approximations and inaccuracy: distortion, null, acoustic model, language model. 44
45 45 Comparisons of the structures Single-best translation Simple, direct ASR and SMT isolated optimization MT flexible, easy to upgrade, multiple translation engines Non-robust to ASR WER N-best hypothesis translation Robust, resistant to ASR WER MT flexible, multiple translation engines Slow, duplicate calculation Word lattice translation Reduce computing cost, efficient Speech translation system, ASR and SMT, overall optimized MT inflexible
46 Conclusions We applied two approaches to improve ASR singlebest translation. By applying a log-linear model, N-best translation approach can improve single-best translation effectively. We observed improved speech translation performance in word lattice translation: Confidence measure filtering Word lattice reduction 46
Multiple System Combination. Jinhua Du CNGL July 23, 2008
Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments
More informationStatistical Phrase-Based Speech Translation
Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More informationAutomatic Speech Recognition and Statistical Machine Translation under Uncertainty
Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationChapter 3: Basics of Language Modelling
Chapter 3: Basics of Language Modelling Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationDecoding Revisited: Easy-Part-First & MERT. February 26, 2015
Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische
More informationDecoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier
Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/mt16/ mid-course-eval.html Decoding The decoder is the part of the
More informationORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation
ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way
More informationFast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science
More informationAnalysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationMachine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples
Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationStatistical Machine Translation and Automatic Speech Recognition under Uncertainty
Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Lambert Mathias A dissertation submitted to the Johns Hopkins University in conformity with the requirements for the degree
More informationN-gram Language Modeling Tutorial
N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationOut of GIZA Efficient Word Alignment Models for SMT
Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza
More informationStatistical NLP Spring HW2: PNP Classification
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationHW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationAdapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization
Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur Human Language Technology Center of Excellence Center for Language
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi
More informationChapter 3: Basics of Language Modeling
Chapter 3: Basics of Language Modeling Section 3.1. Language Modeling in Automatic Speech Recognition (ASR) All graphs in this section are from the book by Schukat-Talamazzini unless indicated otherwise
More informationRecent Developments in Statistical Dialogue Systems
Recent Developments in Statistical Dialogue Systems Steve Young Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department Cambridge, UK Contents Review
More informationEfficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge
More informationVariational Decoding for Statistical Machine Translation
Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationEvaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018
Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence
More informationDiscriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationStatistical NLP Spring Corpus-Based MT
Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it
More informationCorpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1
Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 18: Search & Decoding (Part I) Instructor: Preethi Jyothi Mar 23, 2017 Recall ASR Decoding W = arg max W Pr(O A W )Pr(W ) W = arg max w N 1,N 8" < Y N : n=1
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 7: Testing (Feb 12, 2019) David Bamman, UC Berkeley Significance in NLP You develop a new method for text classification; is it better than what comes
More informationMinimum Error Rate Training Semiring
Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)
More informationEnhanced Bilingual Evaluation Understudy
Enhanced Bilingual Evaluation Understudy Krzysztof Wołk, Krzysztof Marasek Department of Multimedia Polish Japanese Institute of Information Technology, Warsaw, POLAND kwolk@pjwstk.edu.pl Abstract - Our
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More informationWelcome to Pittsburgh!
Welcome to Pittsburgh! Overview of the IWSLT 2005 Evaluation Campaign atthias Eck and Chiori Hori InterACT Carnegie ellon niversity IWSLT2005 Evaluation campaign Working on the same test bed Training Corpus
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions
More informationUnsupervised Model Adaptation using Information-Theoretic Criterion
Unsupervised Model Adaptation using Information-Theoretic Criterion Ariya Rastrow 1, Frederick Jelinek 1, Abhinav Sethy 2 and Bhuvana Ramabhadran 2 1 Human Language Technology Center of Excellence, and
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationLanguage Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky
Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high
More informationUsing a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs
Using a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs Yasuhiro Akiba,, Eiichiro Sumita, Hiromi Nakaiwa, Seiichi Yamamoto, and Hiroshi G. Okuno ATR
More informationStatistical Ranking Problem
Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing
More informationAn Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz
More information] Automatic Speech Recognition (CS753)
] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)
More informationStatistical Machine Translation
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School University of Pisa Pisa, 7-19 May 2008 Part V: Language Modeling 1 Comparing ASR and statistical MT N-gram
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationCS230: Lecture 10 Sequence models II
CS23: Lecture 1 Sequence models II Today s outline We will learn how to: - Automatically score an NLP model I. BLEU score - Improve Machine II. Beam Search Translation results with Beam search III. Speech
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationMachine Translation without Words through Substring Alignment
Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3, Taro Watanabe 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 2 3 now at 1 Machine Translation Translate a source sentence F
More informationLanguage Modelling. Marcello Federico FBK-irst Trento, Italy. MT Marathon, Edinburgh, M. Federico SLM MT Marathon, Edinburgh, 2012
Language Modelling Marcello Federico FBK-irst Trento, Italy MT Marathon, Edinburgh, 2012 Outline 1 Role of LM in ASR and MT N-gram Language Models Evaluation of Language Models Smoothing Schemes Discounting
More informationFoundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model
Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January
More informationImproved Decipherment of Homophonic Ciphers
Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationMachine Translation Evaluation
Machine Translation Evaluation Sara Stymne 2017-03-29 Partly based on Philipp Koehn s slides for chapter 8 Why Evaluation? How good is a given machine translation system? Which one is the best system for
More informationMaximal Lattice Overlap in Example-Based Machine Translation
Maximal Lattice Overlap in Example-Based Machine Translation Rebecca Hutchinson Paul N. Bennett Jaime Carbonell Peter Jansen Ralf Brown June 6, 2003 CMU-CS-03-138 School of Computer Science Carnegie Mellon
More informationAdversarial Training and Decoding Strategies for End-to-end Neural Conversation Models
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Adversarial Training and Decoding Strategies for End-to-end Neural Conversation Models Hori, T.; Wang, W.; Koji, Y.; Hori, C.; Harsham, B.A.;
More informationWhy DNN Works for Acoustic Modeling in Speech Recognition?
Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,
More informationEnd-to-end Automatic Speech Recognition
End-to-end Automatic Speech Recognition Markus Nussbaum-Thom IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, USA Markus Nussbaum-Thom. February 22, 2017 Nussbaum-Thom: IBM Thomas J. Watson
More informationLanguage Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky
Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high
More informationSpeech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation
More information1 Evaluation of SMT systems: BLEU
1 Evaluation of SMT systems: BLEU Idea: We want to define a repeatable evaluation method that uses: a gold standard of human generated reference translations a numerical translation closeness metric in
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationThe Geometry of Statistical Machine Translation
The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide
More informationEmpirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationWhat to Expect from Expected Kneser-Ney Smoothing
What to Expect from Expected Kneser-Ney Smoothing Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang Microsoft, USA {mlevit sarangp shchang}@microsoft.com Abstract Kneser-Ney smoothing on expected
More informationComputing Lattice BLEU Oracle Scores for Machine Translation
Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon {firstname.lastname}@limsi.fr LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationstatistical machine translation
statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical
More informationNatural Language Processing
Natural Language Processing Info 159/259 Lecture 12: Features and hypothesis tests (Oct 3, 2017) David Bamman, UC Berkeley Announcements No office hours for DB this Friday (email if you d like to chat)
More informationCRF Word Alignment & Noisy Channel Translation
CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More information1. Markov models. 1.1 Markov-chain
1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited
More informationA Systematic Comparison of Training Criteria for Statistical Machine Translation
A Systematic Comparison of Training Criteria for Statistical Machine Translation Richard Zens and Saša Hasan and Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationIncremental HMM Alignment for MT System Combination
Incremental HMM Alignment for MT System Combination Chi-Ho Li Microsoft Research Asia 49 Zhichun Road, Beijing, China chl@microsoft.com Yupeng Liu Harbin Institute of Technology 92 Xidazhi Street, Harbin,
More informationLexical Translation Models 1I
Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)
More informationAdvanced topics in language modeling: MaxEnt, features and marginal distritubion constraints
Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints Brian Roark, Google NYU CSCI-GA.2585 Speech Recognition, Weds., April 15, 2015 (Don t forget to pay your taxes!)
More informationBayesian Learning of Non-compositional Phrases with Synchronous Parsing
Bayesian Learning of Non-compositional Phrases with Synchronous Parsing Hao Zhang Computer Science Department University of Rochester Rochester, NY 14627 zhanghao@cs.rochester.edu Chris Quirk Microsoft
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Sequence to Sequence Models and Attention Encode-Decode Architectures for Machine Translation [Figure from Luong et al.] In Sutskever
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationTo Separate Speech! A System for Recognizing Simultaneous Speech
A System for Recognizing Simultaneous Speech John McDonough 1,2,Kenichi Kumatani 2,3,Tobias Gehrig 4, Emilian Stoimenov 4, Uwe Mayer 4, Stefan Schacht 1, Matthias Wölfel 4 and Dietrich Klakow 1 1 Spoken
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 3, 7 Sep., 2016 jtl@ifi.uio.no Machine Translation Evaluation 2 1. Automatic MT-evaluation: 1. BLEU 2. Alternatives 3. Evaluation
More information