Cross-Lingual Language Modeling for Automatic Speech Recogntion

Size: px
Start display at page:

Download "Cross-Lingual Language Modeling for Automatic Speech Recogntion"

Transcription

1 GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim Center for Language and Speech Processing Dept. of Computer Science The Johns Hopkins University, Baltimore, MD 21218, USA GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 1/19

2 Introduction Motivation : Success of statistical modeling techniques Development of modeling and automatic learning techniques A large amount of data for training is available Most resources on English, French and German How to construct stochastic models in resource-deficient languages? GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 2/19

3 Introduction Motivation : Success of statistical modeling techniques Development of modeling and automatic learning techniques A large amount of data for training is available Most resources on English, French and German How to construct stochastic models in resource-deficient languages? Bootstrap from other languages, e.g. Universal phone-set for ASR (Schultz & Waibel, 98, Byrne et al, 00) Exploit parallel texts to project morphological analyzers, POS taggers, etc. (Yarowsky, Ngai & Wicentowski, 01) Language modeling (this talk) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 2/19

4 Introduction We present: An approach to sharpen an LM in a resource-deficient language using comparable text from resource-rich languages GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 3/19

5 Introduction We present: An approach to sharpen an LM in a resource-deficient language using comparable text from resource-rich languages Story-specific language models from contemporaneous text GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 3/19

6 Introduction We present: An approach to sharpen an LM in a resource-deficient language using comparable text from resource-rich languages Story-specific language models from contemporaneous text Integration of machine translation (MT), cross-language information retrieval (CLIR), and language modeling (LM) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 3/19

7 Langauge Models Ex1: Optical Character Recognition All Iam a student GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 4/19

8 Langauge Models Ex1: Optical Character Recognition Ex2: Speech Recognition All I am a student It s [tu:] cold He is [tu:]-years-old GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 4/19

9 Langauge Models Ex1: Optical Character Recognition Ex2: Speech Recognition All I am a student It s [tu:] cold too He is [tu:]-years-old two Many speech & NLP applications deal with ungrammatical sentences/phrases Need to suppress ungrammatical outputs Assign a probability to given a sequence of word strings P( He is two-years-old ) P( He is too-years-old ) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 4/19

10 Langauge Models: ASR problem ASR problem : Finding word string Ŵ = w 1, w 2,, w n given acoustic evidence A Bayes Rule Ŵ = arg max W P(W A) = P(W)P(A W) P(A) Ŵ = arg max W P(W A) (1) (2) P(W)P(A W) (3) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 5/19

11 Langauge Models: ASR problem ASR problem : Finding word string Ŵ = w 1, w 2,, w n given acoustic evidence A Bayes Rule Ŵ = arg max W P(W A) = P(W)P(A W) P(A) Ŵ = arg max W P(W A) (1) (2) P(W)P(A W) (3) Language Model Acoustic Model GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 5/19

12 Langauge Models: Formulation P(w 1, w 2,, w n ) = P(w 1 )P(w 2 w 1 )P(w 3 w 1, w 2 ) P(w n w 1,, w n 1 ) P(w 1 )P(w 2 w 1 )P(w 3 w 1, w 2 ) P(w n w n 2, w n 1 ) n = P(w 1 )P(w 2 w 1 ) P(w i w i 2, w i 1 ) (4) i=3 P(w i w i 2, w i 1 ) = N(w i 2, w i 1, w i ) N(w i 2, w i 1 ) P(w i w i 1 ) = N(w i 1, w i ) N(w i 1 ) P(w i ) = = Trigram (5) = Bigram (6) N(w i ) w V N(w) = Unigram (7) where N(w i 2, w i 1, w i ) denotes the number of times entry (w i 2, w i 1, w i ) appears in the training data and V is the vocabulary. GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 6/19

13 Langauge Models: Evaluation Word Error Rate (WER): Performace of ASR given LM REFERENCE: UP UPSTATE NEW YORK SOMEWHERE UH OVER OVER HUGE AREAS HYPOTHESIS: UPSTATE NEW YORK SOMEWHERE UH ALL ALL THE HUGE AREAS COR(0)/ERR(1): :4 errors per 10 words in reference; WER = 40% Ultimate measure, but expensive to compute Perplexity (PPL): based on cross entropy of test data D w.r.t. LM M H(P D ; P M ) = w V P D (w) log 2 P M (w) (8) = 1 N N log 2 P M (w i w i 2, w i 1 ) (9) i=1 PPL M (D) = 2 H(P D;P M ) = [P M (w 1,, w N )] 1 N (10) For both of WER and PPL, the lower, the better! Close correlation between WER and PPL GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 7/19

14 Information Retrieval Given a query doc, find relevant docs from the collection Vector-based IR: Represent each doc as a Bag-Of-Words (BOW) Convert the BOW into the vector of terms All terms are considered as independent (orthogonal) Measure similarity between a query and docs air automobile bank car... query : ( ) doc : ( ) Similarity measure: cosine similarity inner product sim( d j, q) = t t i=1 w ij w iq i=1 w2 ij t i=1 w2 iq (11) where d j = (w 1j,, w tj ) and q = (w 1q,, w tq ) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 8/19

15 Information Retrieval Retrieval performance evaluation: Precision = Recall = No. {Retrieved docs Relevant docs} No. Retrieved docs No. {Retrieved docs Relevant docs} No. Relevant docs (in the collection) (12) (13) Problems Terms are not orthgonal Mismatches between queries and docs Polysemy: e.g., bank (❶ river, ❷ money, ) recall Synonymy: e.g., car, automobile, vehicle, precision Latent Semantic Analysis (LSA) has been proposed Cross-Lingual IR: query & docs are in different languages Query translation approach GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 9/19

16 Cross-Lingual LM for ASR Mandarin Story Automatic Speech Recognition Baseline Chinese Acoustic Model Chinese Dictionary (Vocabulary) Contemporaneous English Articles C di Automatic Transcription Baseline Chinese Language Model Translation Lexicons Cross-Language Information Retrieval d E i English Article Aligned with Mandarin Story P ˆ(e d Statistical Machine Translation P T (c e) E P CLunigram( c di ) Cross-Language Unigram Model E i ) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 10/19

17 Model Estimation Assume document correspondence, d E i d C i, is known for Chinese test doc d C i, P CL-unigram (c d E i ) = P T (c e) ˆP(e d E i ), c C (14) e E Cross-Language LM construction Build story-specific cross-language LMs, P(c d E i ) Linear interpolation with the baseline trigram LM P CL-interpolated (c k c k 1, c k 2, d E i ) (15) = λp CL-unigram (c k d E i ) + (1 λ)p(c k c k 1, c k 2 ) λ is optimized to minimize the PPL of heldout data via EM algorithm GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 11/19

18 Model Estimation Document correspondence obtained by CLIR For each Chinese test doc d C i, create an English BOW Find the English doc with the highest cosine similarity d E i = arg max d E j DE sim CL (P(e d C i ), ˆP(e d E j )) (16) Estimation of P T (c e) and P T (e c) GIZA++ : statistical MT tool based on IBM model-4 Input : Hong Kong news Chinese-English sentence-aligned parallel corpus 18K docs, 200K sents, 4M wds each We only need translation tables : P T (e c) and P T (c e) Mutual Information-based CL-triggers CL-LSA (Latent Semantic Analysis) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 12/19

19 Cross-Lingual Lexical Triggers: Identification Monolingual Triggers : e.g. either... or Let Cross-Lingual Setting : Translation lexicons Based on Average Mutual Information (I(e; c)) P(e, c) = #d(e, c) N and P(e, c) = #d(e, c) N (17) where #d(e) denote the number of English articles in which e occurs, and let P(e) = #d(e) N and P(c e) = P(e, c) P(e) (18) I(e; c) = P(e, c) log P(c e) P(c) + P(e, c) log P( c e) P( c) + P(ē, c) log P(c ē) P(c) + P(ē, c) log P( c ē) P( c) (19) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 13/19

20 Cross-Lingual Lexical Triggers: Estimation We estimate the trigger-based CL unigram probs with P Trig (c e) = I(e; c) c C I(e; c ), (20) Analogous to (14), P Trig-unigram (c d E i ) = e E P Trig (c e) ˆP(e d E i ) (21) Again, we build the interpolated model P Trig-interpolated (c k c k 1, c k 2, d E i ) (22) = λp Trig-unigram (c k d E i ) + (1 λ)p(c k c k 1, c k 2 ) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 14/19

21 Latent Semantic Analysis for CLIR Singular Value Decomposition (SVD) of the parallel corpus d E 1 W U S V T... d E N = d C 1... d C N M N M R R R R N x x x x Input : word-document frequency matrix, W Reduce the dimension into the smaller but adequate subpace Singular Value Decomposition : U, V, and S S : diagonal matrix w/ diagonal entries σ 1,, σ k where σ 1 σ 2 σ k (k R) Remove noisy entries by setting σ i = 0 for i > R GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 15/19

22 Latent Semantic Analysis for CLIR Singular Value Decomposition (SVD) of the parallel corpus W U S V T d E J = d C J M N M R R R R N x x x x Input : word-document frequency matrix, W Reduce the dimension into the smaller but adequate subpace Singular Value Decomposition : U, V, and S S : diagonal matrix w/ diagonal entries σ 1,, σ k where σ 1 σ 2 σ k (k R) Remove noisy entries by setting σ i = 0 for i > R GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 15/19

23 Latent Semantic Analysis for CLIR Singular Value Decomposition (SVD) of the parallel corpus W U S V T d E J = d C J M N M R R R R N x x x x Input : word-document frequency matrix, W Reduce the dimension into the smaller but adequate subpace Singular Value Decomposition : U, V, and S S : diagonal matrix w/ diagonal entries σ 1,, σ k where σ 1 σ 2 σ k (k R) Remove noisy entries by setting σ i = 0 for i > R GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 15/19

24 Latent Semantic Analysis for CLIR d E 1... Folding-in a monolingual corpus W U S V T d E P = M P M R R R R P x x x x Given a monolingual corpus, W, in either side Use the same matrices U, S Project into low-dimensional space, V = S 1 U 1 W Compare a query and a document in the reduced dimensional space GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 16/19

25 Training and Test Corpora Acoustic model training HUB4-NE Mandarin training data (96K wds) 10 hours Chinese monolingual language model training XINHUA : 13M wds HUB4-NE : 96K wds ASR test set : NIST HUB4-NE test data (only F0 portion) 1263 sents, 9.8K wds ( ) English CLIR corpus : NAB-TDT NAB (1997 LA, WP) + TDT-2 (1998 APW, NYT) 45K docs, 30M wds GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 17/19

26 ASR Experimental Results Vocab : 51K for Chinese 300-best list rescoring Oracle best/worst WER : 33.4/94.4% for Xinhua and 39.7/95.5% for HUB4-NE Language Model Perp WER CER p-value Xinhua Trigram % 28.8% Trig-interpolated % 28.6% LSA-interpolated % 28.9% CL-interpolated % 28.4% < HUB4-NE Trigram % 44.1% Trig-interpolated % 43.3% < LSA-interpolated % 43.1% <0.001 CL-interpolated % 43.1% < GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 18/19

27 Conclusions Exploits side-information from contemporaneous articles useful for resource-deficient languages Statistically significant improvements in ASR WER Use of CL triggers & CL-LSA A document-aligned corpus suffices rather than a sentence-aligned corpus Future work Extensions to higher order N-grams (e.g., bigrams) Discriminate LMs for Word Sense Disambiguation story-specific translation models Applications to other languages (e.g., Arabic) and other tasks (e.g., MT) GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR p. 19/19

Chapter 3: Basics of Language Modelling

Chapter 3: Basics of Language Modelling Chapter 3: Basics of Language Modelling Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Chapter 3: Basics of Language Modeling

Chapter 3: Basics of Language Modeling Chapter 3: Basics of Language Modeling Section 3.1. Language Modeling in Automatic Speech Recognition (ASR) All graphs in this section are from the book by Schukat-Talamazzini unless indicated otherwise

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

N-gram Language Modeling Tutorial

N-gram Language Modeling Tutorial N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a

More information

Latent semantic indexing

Latent semantic indexing Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

TnT Part of Speech Tagger

TnT Part of Speech Tagger TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).

More information

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi) Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

{ Jurafsky & Martin Ch. 6:! 6.6 incl.

{ Jurafsky & Martin Ch. 6:! 6.6 incl. N-grams Now Simple (Unsmoothed) N-grams Smoothing { Add-one Smoothing { Backo { Deleted Interpolation Reading: { Jurafsky & Martin Ch. 6:! 6.6 incl. 1 Word-prediction Applications Augmentative Communication

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School University of Pisa Pisa, 7-19 May 2008 Part V: Language Modeling 1 Comparing ASR and statistical MT N-gram

More information

Language Processing with Perl and Prolog

Language Processing with Perl and Prolog Language Processing with Perl and Prolog Chapter 5: Counting Words Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Linear Algebra Background

Linear Algebra Background CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection

More information

Triplet Lexicon Models for Statistical Machine Translation

Triplet Lexicon Models for Statistical Machine Translation Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language

More information

Probabilistic Language Modeling

Probabilistic Language Modeling Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.

More information

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Multiple System Combination. Jinhua Du CNGL July 23, 2008 Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Language Modeling. Michael Collins, Columbia University

Language Modeling. Michael Collins, Columbia University Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting

More information

Natural Language Processing SoSe Words and Language Model

Natural Language Processing SoSe Words and Language Model Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

SYNTHER A NEW M-GRAM POS TAGGER

SYNTHER A NEW M-GRAM POS TAGGER SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best

More information

Naïve Bayes, Maxent and Neural Models

Naïve Bayes, Maxent and Neural Models Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words

More information

An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling

An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling Masaharu Kato, Tetsuo Kosaka, Akinori Ito and Shozo Makino Abstract Topic-based stochastic models such as the probabilistic

More information

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes 1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition

More information

ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization

Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur Human Language Technology Center of Excellence Center for Language

More information

Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing. Statistical Inference: n-grams Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

More information

Variational Decoding for Statistical Machine Translation

Variational Decoding for Statistical Machine Translation Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Language Models, Graphical Models Sameer Maskey Week 13, April 13, 2010 Some slides provided by Stanley Chen and from Bishop Book Resources 1 Announcements Final Project Due,

More information

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc)

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc) NLP: N-Grams Dan Garrette dhg@cs.utexas.edu December 27, 2013 1 Language Modeling Tasks Language idenfication / Authorship identification Machine Translation Speech recognition Optical character recognition

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

Discriminative Training. March 4, 2014

Discriminative Training. March 4, 2014 Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =

More information

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013 Machine Translation CL1: Jordan Boyd-Graber University of Maryland November 11, 2013 Adapted from material by Philipp Koehn CL1: Jordan Boyd-Graber (UMD) Machine Translation November 11, 2013 1 / 48 Roadmap

More information

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints

Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints Brian Roark, Google NYU CSCI-GA.2585 Speech Recognition, Weds., April 15, 2015 (Don t forget to pay your taxes!)

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Semantic Similarity from Corpora - Latent Semantic Analysis

Semantic Similarity from Corpora - Latent Semantic Analysis Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY strappa@fbk.eu Overview Latent Semantic

More information

N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition

N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition 2010 11 5 N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition 1 48-106413 Abstract Large-Vocabulary Continuous Speech Recognition(LVCSR) system has rapidly been growing today.

More information

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

CRF Word Alignment & Noisy Channel Translation

CRF Word Alignment & Noisy Channel Translation CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment

More information

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer + Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Pre-Initialized Composition For Large-Vocabulary Speech Recognition

Pre-Initialized Composition For Large-Vocabulary Speech Recognition Pre-Initialized Composition For Large-Vocabulary Speech Recognition Cyril Allauzen, Michael Riley Google Research, 76 Ninth Avenue, New York, NY, USA allauzen@google.com, riley@google.com Abstract This

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes Outline Precision and Recall The problem with indexing so far Intuition for solving it Overview of the solution The Math How to measure

More information

1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples:

1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples: Sept. 20-22 2005 1 CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall 2005 In this lecture we introduce some of the mathematical tools that are at the base of the technical approaches

More information

Word Alignment III: Fertility Models & CRFs. February 3, 2015

Word Alignment III: Fertility Models & CRFs. February 3, 2015 Word Alignment III: Fertility Models & CRFs February 3, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } {

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

Fun with weighted FSTs

Fun with weighted FSTs Fun with weighted FSTs Informatics 2A: Lecture 18 Shay Cohen School of Informatics University of Edinburgh 29 October 2018 1 / 35 Kedzie et al. (2018) - Content Selection in Deep Learning Models of Summarization

More information

Log-linear models (part 1)

Log-linear models (part 1) Log-linear models (part 1) CS 690N, Spring 2018 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2018/ Brendan O Connor College of Information and Computer Sciences University

More information

Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process

Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process Songfang Huang and Steve Renals The Centre for Speech Technology Research University of Edinburgh, Edinburgh, EH8

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

A fast and simple algorithm for training neural probabilistic language models

A fast and simple algorithm for training neural probabilistic language models A fast and simple algorithm for training neural probabilistic language models Andriy Mnih Joint work with Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 25 January 2013 1

More information

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents

More information

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider

More information

Sparse Models for Speech Recognition

Sparse Models for Speech Recognition Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

perplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3)

perplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3) Chapter 5 Language Modeling 5.1 Introduction A language model is simply a model of what strings (of words) are more or less likely to be generated by a speaker of English (or some other language). More

More information

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Zaihu Pang 1, Xihong Wu 1, and Lei Xu 1,2 1 Speech and Hearing Research Center, Key Laboratory of Machine

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

Fast Logistic Regression for Text Categorization with Variable-Length N-grams Fast Logistic Regression for Text Categorization with Variable-Length N-grams Georgiana Ifrim *, Gökhan Bakır +, Gerhard Weikum * * Max-Planck Institute for Informatics Saarbrücken, Germany + Google Switzerland

More information

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward

More information

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =

More information

Statistical Machine Translation and Automatic Speech Recognition under Uncertainty

Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Lambert Mathias A dissertation submitted to the Johns Hopkins University in conformity with the requirements for the degree

More information

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Rachit Arora Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India.

More information

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp

More information