Cross-Lingual Language Modeling for Automatic Speech Recogntion
|
|
- Morgan Warner
- 5 years ago
- Views:
Transcription
1 GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim Center for Language and Speech Processing Dept. of Computer Science The Johns Hopkins University, Baltimore, MD 21218, USA GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 1/19
2 Introduction Motivation : Success of statistical modeling techniques Development of modeling and automatic learning techniques A large amount of data for training is available Most resources on English, French and German How to construct stochastic models in resource-deficient languages? GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 2/19
3 Introduction Motivation : Success of statistical modeling techniques Development of modeling and automatic learning techniques A large amount of data for training is available Most resources on English, French and German How to construct stochastic models in resource-deficient languages? Bootstrap from other languages, e.g. Universal phone-set for ASR (Schultz & Waibel, 98, Byrne et al, 00) Exploit parallel texts to project morphological analyzers, POS taggers, etc. (Yarowsky, Ngai & Wicentowski, 01) Language modeling (this talk) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 2/19
4 Introduction We present: An approach to sharpen an LM in a resource-deficient language using comparable text from resource-rich languages GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 3/19
5 Introduction We present: An approach to sharpen an LM in a resource-deficient language using comparable text from resource-rich languages Story-specific language models from contemporaneous text GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 3/19
6 Introduction We present: An approach to sharpen an LM in a resource-deficient language using comparable text from resource-rich languages Story-specific language models from contemporaneous text Integration of machine translation (MT), cross-language information retrieval (CLIR), and language modeling (LM) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 3/19
7 Langauge Models Ex1: Optical Character Recognition All Iam a student GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 4/19
8 Langauge Models Ex1: Optical Character Recognition Ex2: Speech Recognition All I am a student It s [tu:] cold He is [tu:]-years-old GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 4/19
9 Langauge Models Ex1: Optical Character Recognition Ex2: Speech Recognition All I am a student It s [tu:] cold too He is [tu:]-years-old two Many speech & NLP applications deal with ungrammatical sentences/phrases Need to suppress ungrammatical outputs Assign a probability to given a sequence of word strings P( He is two-years-old ) P( He is too-years-old ) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 4/19
10 Langauge Models: ASR problem ASR problem : Finding word string Ŵ = w 1, w 2,, w n given acoustic evidence A Bayes Rule Ŵ = arg max W P(W A) = P(W)P(A W) P(A) Ŵ = arg max W P(W A) (1) (2) P(W)P(A W) (3) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 5/19
11 Langauge Models: ASR problem ASR problem : Finding word string Ŵ = w 1, w 2,, w n given acoustic evidence A Bayes Rule Ŵ = arg max W P(W A) = P(W)P(A W) P(A) Ŵ = arg max W P(W A) (1) (2) P(W)P(A W) (3) Language Model Acoustic Model GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 5/19
12 Langauge Models: Formulation P(w 1, w 2,, w n ) = P(w 1 )P(w 2 w 1 )P(w 3 w 1, w 2 ) P(w n w 1,, w n 1 ) P(w 1 )P(w 2 w 1 )P(w 3 w 1, w 2 ) P(w n w n 2, w n 1 ) n = P(w 1 )P(w 2 w 1 ) P(w i w i 2, w i 1 ) (4) i=3 P(w i w i 2, w i 1 ) = N(w i 2, w i 1, w i ) N(w i 2, w i 1 ) P(w i w i 1 ) = N(w i 1, w i ) N(w i 1 ) P(w i ) = = Trigram (5) = Bigram (6) N(w i ) w V N(w) = Unigram (7) where N(w i 2, w i 1, w i ) denotes the number of times entry (w i 2, w i 1, w i ) appears in the training data and V is the vocabulary. GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 6/19
13 Langauge Models: Evaluation Word Error Rate (WER): Performace of ASR given LM REFERENCE: UP UPSTATE NEW YORK SOMEWHERE UH OVER OVER HUGE AREAS HYPOTHESIS: UPSTATE NEW YORK SOMEWHERE UH ALL ALL THE HUGE AREAS COR(0)/ERR(1): :4 errors per 10 words in reference; WER = 40% Ultimate measure, but expensive to compute Perplexity (PPL): based on cross entropy of test data D w.r.t. LM M H(P D ; P M ) = w V P D (w) log 2 P M (w) (8) = 1 N N log 2 P M (w i w i 2, w i 1 ) (9) i=1 PPL M (D) = 2 H(P D;P M ) = [P M (w 1,, w N )] 1 N (10) For both of WER and PPL, the lower, the better! Close correlation between WER and PPL GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 7/19
14 Information Retrieval Given a query doc, find relevant docs from the collection Vector-based IR: Represent each doc as a Bag-Of-Words (BOW) Convert the BOW into the vector of terms All terms are considered as independent (orthogonal) Measure similarity between a query and docs air automobile bank car... query : ( ) doc : ( ) Similarity measure: cosine similarity inner product sim( d j, q) = t t i=1 w ij w iq i=1 w2 ij t i=1 w2 iq (11) where d j = (w 1j,, w tj ) and q = (w 1q,, w tq ) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 8/19
15 Information Retrieval Retrieval performance evaluation: Precision = Recall = No. {Retrieved docs Relevant docs} No. Retrieved docs No. {Retrieved docs Relevant docs} No. Relevant docs (in the collection) (12) (13) Problems Terms are not orthgonal Mismatches between queries and docs Polysemy: e.g., bank (❶ river, ❷ money, ) recall Synonymy: e.g., car, automobile, vehicle, precision Latent Semantic Analysis (LSA) has been proposed Cross-Lingual IR: query & docs are in different languages Query translation approach GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 9/19
16 Cross-Lingual LM for ASR Mandarin Story Automatic Speech Recognition Baseline Chinese Acoustic Model Chinese Dictionary (Vocabulary) Contemporaneous English Articles C di Automatic Transcription Baseline Chinese Language Model Translation Lexicons Cross-Language Information Retrieval d E i English Article Aligned with Mandarin Story P ˆ(e d Statistical Machine Translation P T (c e) E P CLunigram( c di ) Cross-Language Unigram Model E i ) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 10/19
17 Model Estimation Assume document correspondence, d E i d C i, is known for Chinese test doc d C i, P CL-unigram (c d E i ) = P T (c e) ˆP(e d E i ), c C (14) e E Cross-Language LM construction Build story-specific cross-language LMs, P(c d E i ) Linear interpolation with the baseline trigram LM P CL-interpolated (c k c k 1, c k 2, d E i ) (15) = λp CL-unigram (c k d E i ) + (1 λ)p(c k c k 1, c k 2 ) λ is optimized to minimize the PPL of heldout data via EM algorithm GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 11/19
18 Model Estimation Document correspondence obtained by CLIR For each Chinese test doc d C i, create an English BOW Find the English doc with the highest cosine similarity d E i = arg max d E j DE sim CL (P(e d C i ), ˆP(e d E j )) (16) Estimation of P T (c e) and P T (e c) GIZA++ : statistical MT tool based on IBM model-4 Input : Hong Kong news Chinese-English sentence-aligned parallel corpus 18K docs, 200K sents, 4M wds each We only need translation tables : P T (e c) and P T (c e) Mutual Information-based CL-triggers CL-LSA (Latent Semantic Analysis) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 12/19
19 Cross-Lingual Lexical Triggers: Identification Monolingual Triggers : e.g. either... or Let Cross-Lingual Setting : Translation lexicons Based on Average Mutual Information (I(e; c)) P(e, c) = #d(e, c) N and P(e, c) = #d(e, c) N (17) where #d(e) denote the number of English articles in which e occurs, and let P(e) = #d(e) N and P(c e) = P(e, c) P(e) (18) I(e; c) = P(e, c) log P(c e) P(c) + P(e, c) log P( c e) P( c) + P(ē, c) log P(c ē) P(c) + P(ē, c) log P( c ē) P( c) (19) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 13/19
20 Cross-Lingual Lexical Triggers: Estimation We estimate the trigger-based CL unigram probs with P Trig (c e) = I(e; c) c C I(e; c ), (20) Analogous to (14), P Trig-unigram (c d E i ) = e E P Trig (c e) ˆP(e d E i ) (21) Again, we build the interpolated model P Trig-interpolated (c k c k 1, c k 2, d E i ) (22) = λp Trig-unigram (c k d E i ) + (1 λ)p(c k c k 1, c k 2 ) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 14/19
21 Latent Semantic Analysis for CLIR Singular Value Decomposition (SVD) of the parallel corpus d E 1 W U S V T... d E N = d C 1... d C N M N M R R R R N x x x x Input : word-document frequency matrix, W Reduce the dimension into the smaller but adequate subpace Singular Value Decomposition : U, V, and S S : diagonal matrix w/ diagonal entries σ 1,, σ k where σ 1 σ 2 σ k (k R) Remove noisy entries by setting σ i = 0 for i > R GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 15/19
22 Latent Semantic Analysis for CLIR Singular Value Decomposition (SVD) of the parallel corpus W U S V T d E J = d C J M N M R R R R N x x x x Input : word-document frequency matrix, W Reduce the dimension into the smaller but adequate subpace Singular Value Decomposition : U, V, and S S : diagonal matrix w/ diagonal entries σ 1,, σ k where σ 1 σ 2 σ k (k R) Remove noisy entries by setting σ i = 0 for i > R GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 15/19
23 Latent Semantic Analysis for CLIR Singular Value Decomposition (SVD) of the parallel corpus W U S V T d E J = d C J M N M R R R R N x x x x Input : word-document frequency matrix, W Reduce the dimension into the smaller but adequate subpace Singular Value Decomposition : U, V, and S S : diagonal matrix w/ diagonal entries σ 1,, σ k where σ 1 σ 2 σ k (k R) Remove noisy entries by setting σ i = 0 for i > R GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 15/19
24 Latent Semantic Analysis for CLIR d E 1... Folding-in a monolingual corpus W U S V T d E P = M P M R R R R P x x x x Given a monolingual corpus, W, in either side Use the same matrices U, S Project into low-dimensional space, V = S 1 U 1 W Compare a query and a document in the reduced dimensional space GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 16/19
25 Training and Test Corpora Acoustic model training HUB4-NE Mandarin training data (96K wds) 10 hours Chinese monolingual language model training XINHUA : 13M wds HUB4-NE : 96K wds ASR test set : NIST HUB4-NE test data (only F0 portion) 1263 sents, 9.8K wds ( ) English CLIR corpus : NAB-TDT NAB (1997 LA, WP) + TDT-2 (1998 APW, NYT) 45K docs, 30M wds GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 17/19
26 ASR Experimental Results Vocab : 51K for Chinese 300-best list rescoring Oracle best/worst WER : 33.4/94.4% for Xinhua and 39.7/95.5% for HUB4-NE Language Model Perp WER CER p-value Xinhua Trigram % 28.8% Trig-interpolated % 28.6% LSA-interpolated % 28.9% CL-interpolated % 28.4% < HUB4-NE Trigram % 44.1% Trig-interpolated % 43.3% < LSA-interpolated % 43.1% <0.001 CL-interpolated % 43.1% < GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 18/19
27 Conclusions Exploits side-information from contemporaneous articles useful for resource-deficient languages Statistically significant improvements in ASR WER Use of CL triggers & CL-LSA A document-aligned corpus suffices rather than a sentence-aligned corpus Future work Extensions to higher order N-grams (e.g., bigrams) Discriminate LMs for Word Sense Disambiguation story-specific translation models Applications to other languages (e.g., Arabic) and other tasks (e.g., MT) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 19/19
Chapter 3: Basics of Language Modelling
Chapter 3: Basics of Language Modelling Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationChapter 3: Basics of Language Modeling
Chapter 3: Basics of Language Modeling Section 3.1. Language Modeling in Automatic Speech Recognition (ASR) All graphs in this section are from the book by Schukat-Talamazzini unless indicated otherwise
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationN-gram Language Modeling Tutorial
N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationDiscriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationLow-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park
Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationLecture 13: More uses of Language Models
Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches
More informationMachine Learning for natural language processing
Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a
More informationLatent semantic indexing
Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationTnT Part of Speech Tagger
TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).
More informationNatural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)
Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More information{ Jurafsky & Martin Ch. 6:! 6.6 incl.
N-grams Now Simple (Unsmoothed) N-grams Smoothing { Add-one Smoothing { Backo { Deleted Interpolation Reading: { Jurafsky & Martin Ch. 6:! 6.6 incl. 1 Word-prediction Applications Augmentative Communication
More informationStatistical Machine Translation
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School University of Pisa Pisa, 7-19 May 2008 Part V: Language Modeling 1 Comparing ASR and statistical MT N-gram
More informationLanguage Processing with Perl and Prolog
Language Processing with Perl and Prolog Chapter 5: Counting Words Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationLinear Algebra Background
CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationProbabilistic Language Modeling
Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.
More informationMultiple System Combination. Jinhua Du CNGL July 23, 2008
Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments
More informationVariable Latent Semantic Indexing
Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background
More informationLanguage Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN
Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationLanguage Modeling. Michael Collins, Columbia University
Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting
More informationNatural Language Processing SoSe Words and Language Model
Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationAn Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationSYNTHER A NEW M-GRAM POS TAGGER
SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de
More informationAutomatic Speech Recognition and Statistical Machine Translation under Uncertainty
Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationSpeech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories
Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More informationAn Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling
An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling Masaharu Kato, Tetsuo Kosaka, Akinori Ito and Shozo Makino Abstract Topic-based stochastic models such as the probabilistic
More informationCategorization ANLP Lecture 10 Text Categorization with Naive Bayes
1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition
More informationANLP Lecture 10 Text Categorization with Naive Bayes
ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationAdapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization
Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur Human Language Technology Center of Excellence Center for Language
More informationNatural Language Processing. Statistical Inference: n-grams
Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability
More informationVariational Decoding for Statistical Machine Translation
Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1
More informationStatistical Methods for NLP
Statistical Methods for NLP Language Models, Graphical Models Sameer Maskey Week 13, April 13, 2010 Some slides provided by Stanley Chen and from Bishop Book Resources 1 Announcements Final Project Due,
More informationNLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc)
NLP: N-Grams Dan Garrette dhg@cs.utexas.edu December 27, 2013 1 Language Modeling Tasks Language idenfication / Authorship identification Machine Translation Speech recognition Optical character recognition
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationDiscriminative Training. March 4, 2014
Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =
More informationMachine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013
Machine Translation CL1: Jordan Boyd-Graber University of Maryland November 11, 2013 Adapted from material by Philipp Koehn CL1: Jordan Boyd-Graber (UMD) Machine Translation November 11, 2013 1 / 48 Roadmap
More informationDeep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of
More informationTopic Modeling: Beyond Bag-of-Words
University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationAdvanced topics in language modeling: MaxEnt, features and marginal distritubion constraints
Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints Brian Roark, Google NYU CSCI-GA.2585 Speech Recognition, Weds., April 15, 2015 (Don t forget to pay your taxes!)
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationSemantic Similarity from Corpora - Latent Semantic Analysis
Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY strappa@fbk.eu Overview Latent Semantic
More informationN-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition
2010 11 5 N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition 1 48-106413 Abstract Large-Vocabulary Continuous Speech Recognition(LVCSR) system has rapidly been growing today.
More informationLatent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More informationCRF Word Alignment & Noisy Channel Translation
CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment
More informationNgram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer
+ Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward
More informationLanguage Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky
Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high
More informationPre-Initialized Composition For Large-Vocabulary Speech Recognition
Pre-Initialized Composition For Large-Vocabulary Speech Recognition Cyril Allauzen, Michael Riley Google Research, 76 Ninth Avenue, New York, NY, USA allauzen@google.com, riley@google.com Abstract This
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationLanguage Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky
Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationINF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes
INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes Outline Precision and Recall The problem with indexing so far Intuition for solving it Overview of the solution The Math How to measure
More information1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples:
Sept. 20-22 2005 1 CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall 2005 In this lecture we introduce some of the mathematical tools that are at the base of the technical approaches
More informationWord Alignment III: Fertility Models & CRFs. February 3, 2015
Word Alignment III: Fertility Models & CRFs February 3, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } {
More informationManning & Schuetze, FSNLP (c) 1999,2000
558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),
More informationFun with weighted FSTs
Fun with weighted FSTs Informatics 2A: Lecture 18 Shay Cohen School of Informatics University of Edinburgh 29 October 2018 1 / 35 Kedzie et al. (2018) - Content Selection in Deep Learning Models of Summarization
More informationLog-linear models (part 1)
Log-linear models (part 1) CS 690N, Spring 2018 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2018/ Brendan O Connor College of Information and Computer Sciences University
More informationModeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process
Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process Songfang Huang and Steve Renals The Centre for Speech Technology Research University of Edinburgh, Edinburgh, EH8
More informationFoundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model
Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January
More informationA fast and simple algorithm for training neural probabilistic language models
A fast and simple algorithm for training neural probabilistic language models Andriy Mnih Joint work with Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 25 January 2013 1
More informationCS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya
CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents
More informationEmpirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationperplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3)
Chapter 5 Language Modeling 5.1 Introduction A language model is simply a model of what strings (of words) are more or less likely to be generated by a speaker of English (or some other language). More
More informationDiscriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning
Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Zaihu Pang 1, Xihong Wu 1, and Lei Xu 1,2 1 Speech and Hearing Research Center, Key Laboratory of Machine
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationFast Logistic Regression for Text Categorization with Variable-Length N-grams
Fast Logistic Regression for Text Categorization with Variable-Length N-grams Georgiana Ifrim *, Gökhan Bakır +, Gerhard Weikum * * Max-Planck Institute for Informatics Saarbrücken, Germany + Google Switzerland
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward
More informationSpeech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =
More informationStatistical Machine Translation and Automatic Speech Recognition under Uncertainty
Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Lambert Mathias A dissertation submitted to the Johns Hopkins University in conformity with the requirements for the degree
More informationLatent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization
Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Rachit Arora Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India.
More informationRecap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP
Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp
More information