N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition

Similar documents
An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Machine Learning for natural language processing

N-gram Language Modeling Tutorial

Chapter 3: Basics of Language Modelling

DT2118 Speech and Speaker Recognition

Exploring Asymmetric Clustering for Statistical Language Modeling

N-gram Language Modeling

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Chapter 3: Basics of Language Modeling

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Natural Language Processing. Statistical Inference: n-grams

Language Modelling. Marcello Federico FBK-irst Trento, Italy. MT Marathon, Edinburgh, M. Federico SLM MT Marathon, Edinburgh, 2012

CS145: INTRODUCTION TO DATA MINING

Graphical models for part of speech tagging

Language Models. Hongning Wang

Topic tracking language model for speech recognition

CSA4050: Advanced Topics Natural Language Processing. Lecture Statistics III. Statistical Approaches to NLP

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

perplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3)

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Notes on Latent Semantic Analysis

Ranked Retrieval (2)

The Noisy Channel Model and Markov Models

What to Expect from Expected Kneser-Ney Smoothing

Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process

Why Language Models and Inverse Document Frequency for Information Retrieval?

Log-linear models (part 1)

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

CS 188: Artificial Intelligence Fall 2011

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

{ Jurafsky & Martin Ch. 6:! 6.6 incl.

A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR)

topic modeling hanna m. wallach

Language Modeling. Michael Collins, Columbia University

N-gram Language Model. Language Models. Outline. Language Model Evaluation. Given a text w = w 1...,w t,...,w w we can compute its probability by:

IBM Research Report. Model M Lite: A Fast Class-Based Language Model

Latent Semantic Analysis. Hongning Wang

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

CMPT-825 Natural Language Processing

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Microsoft Corporation.

SYNTHER A NEW M-GRAM POS TAGGER

Probabilistic Latent Semantic Analysis

Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization

Sequences and Information

Latent Dirichlet Allocation Based Multi-Document Summarization

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview

Boolean and Vector Space Retrieval Models

Probabilistic Language Modeling

CS 6120/CS4120: Natural Language Processing

Week 13: Language Modeling II Smoothing in Language Modeling. Irina Sergienya

Graphical Models. Mark Gales. Lent Machine Learning for Language Processing: Lecture 3. MPhil in Advanced Computer Science

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

A Continuous-Time Model of Topic Co-occurrence Trends

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Language Processing with Perl and Prolog

Language Modelling. Steve Renals. Automatic Speech Recognition ASR Lecture 11 6 March ASR Lecture 11 Language Modelling 1

Probabilistic Latent Semantic Analysis

ANLP Lecture 6 N-gram models and smoothing

University of Illinois at Urbana-Champaign. Midterm Examination

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model

Language Modeling for Speech Recognition Incorporating Probabilistic Topic Models

Language Model. Introduction to N-grams

A Study of the Dirichlet Priors for Term Frequency Normalisation

Unied Presentation Contents Retrieval using Voice Information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Collaborative topic models: motivations cont

Conditional Language Modeling. Chris Dyer

Language Models, Smoothing, and IDF Weighting

Natural Language Processing SoSe Words and Language Model

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization

Fall CS646: Information Retrieval. Lecture 6 Boolean Search and Vector Space Model. Jiepu Jiang University of Massachusetts Amherst 2016/09/26

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Augmented Statistical Models for Speech Recognition

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

DISTRIBUTIONAL SEMANTICS

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

Empirical Methods in Natural Language Processing Lecture 5 N-gram Language Models

Latent Dirichlet Allocation Introduction/Overview

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

Language Models. Philipp Koehn. 11 September 2018

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

A Study of Smoothing Methods for Language Models Applied to Information Retrieval

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Text Analytics (Text Mining)

Neural Networks for Web Page Classification Based on Augmented PCA

Machine Learning for natural language processing

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Statistical Machine Translation

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Topic Modeling: Beyond Bag-of-Words

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Transcription:

2010 11 5 N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition 1 48-106413 Abstract Large-Vocabulary Continuous Speech Recognition(LVCSR) system has rapidly been growing today. The purpose of this system is automating the speech transcript of lecture or news etc. Speech recognition system has two core parts; acoustic model and language model. I focus on the language model. Statistical language models are widely used as language model for LVCSR. Above all, a N-gram language model is a de facto standard as a language model for speech recognition. N-gram is the most excellent model of any other language models, but there are Fig.1 Overview of Speech Recognition some problems. In this paper, I introduce three major issues of N-gram model, and overview recent research trends of language model as the solutions of these problems. 1 (Large-Vocabulary Continuous Speech Recognition: LVCSR) [1] (Acoustic Model) (Language Model) 2 Fig.1 [2] [3] (N 1) N-gram [4] N-gram [5, 6] N-gram N-gram 2 N- gram 3 4 5 N-gram 3 6 2 N-gram 2.1 N-gram N-gram [4] N-gram (N 1) 1

Fig.2 An Example of Calculating Trigram Probability Fig.3 Back-off Smoothing N-gram (N 1) N-gram w i i n+1 w i w b a w 1, w 2,..., w k a b P (w i w i 1 1 ) P (w i w i 1 i n+1 ) (1) N = 1, 2, 3 unigram bigram trigram P (w i w i 1 1 ) N (N-1) P (w i w i 1 i n+1 ) = C(wi i n+1 ) C(w i 1 i n+1 ) (2) C(w i 1) w i 1 N-gram N N = 2 3 bigram trigram [7] trigram Fig.2 2.2 back-off N-gram N N-gram (2) 0 N-gram back-off N-gram N-gram [7] back-off Fig.3 Fig.3 N N-1 trigram bigram 2.3 N-gram N-gram 3 3 5 2

3 N-gram (N 1) trigram 2 [8] [9] N-gram[10] Structured Language Model[11] 3.1 [8] Fig.4 (2004 ) N-gram w n H = {w n H,, w n 1 } w n P c (w n H) P c (w n H) = 1 H w h H δ(w n, w h ) (3) H H δ { 1 (i = j) δ(i, j) = 0 (i j) (4) Kuhn [8] N-gram N-gram Fig.4 Probability of Occurrence of The Same Word Again [12] [13] N-gram (3) H H N- gram 1 3.2 [9] Fig.5 ( 2004 ) w a w b 3

1 N-gram [14] [15] Fig.5 Probability of Co-occurrence of Two Words (trigger-pair) w a w b w w (self-trigger) P T (w d) exp( i λ i f i (d)) (5) { 1, if d w f i (d) = 0, other (6) (5) (ME )[3] d w f i (d) d w λ i [9] [3] MI(w a ; w b ) = P (w a, w b )log P (w b w a ) P (w b ) +P (w a, w b )log P (w b w a ) P (w b ) +P (w a, w b )log P (w b w a ) P (w b ) +P (w a, w b )log P (w b w a ) P (w b ) (7) k 4 N-gram (Language Model Adaptation) [16, 17] 4.1 2 (Topic Adaptation) (Speech Style Adaptation) 4.2 4

1. 2. 2 1. (PLSA)[18] 2. World Wide Web (WWW) 4.2.1 (PLSA) (Probabilistic Latent Semantic Analysis: PLSA)[18] PLSA d w z P (w d) = z P (w z)p (z d) (8) PLSA EM bigram trigram N-gram unigram [19] P (w i w i 1 w i 2 ) P (w i w i 1 w i 2 ) P (w i d) P (w i ) P (w i w i 1 w i 2 ) (9) trigram trigram Fig.6 Unsupervised Language Model Adaptation by Using Web trigram PLSA [20] 2 PLSA [21] PLSA 3 4.2.2 Web World Wide Web Web [13] Web [22] 2 Web Web 1. 2. 5

3. 4. Web 5. Web Fig.6 4.2.3 Web tf-idf[23] tf-idf tf-idf tf Term Frequency: idf Inverse Document Frequency: 2 tfidf = tf idf (10) tf i = n i k n k D idf i = log {d : d t i } (11) (12) n i i D {d : d t i } i idf tf-idf [12] tf-idf 3 Web [24] tf-idf Web Web Fig.7 An Example of tagged transcript 4.3 [25] Fig.7 [26] N-gram 5 N-gram N-gram UNK P (UNK w i 2 w i 1 ) 6

Web Web Web Fig.8 A Hierarchical Language Model UNK N-gram N-gram [27] Fig.8 Wikipedia Web [28] N-gram Wikipedia [29] 6 N-gram 3 N-gram N-gram [1],,,,.., 2001. [2]. ( )., Vol. 72, No. 8, pp. 1284 1290, 1989. [3].., 1999. [4] Shannon C. E. A mathematical theory of communication. Bell System Technical Journal, pp. 623 656, 1948. [5]., Vol. 2008, No. 68, pp. 43 46, 2008. [6] R Rosenfeld. Two Decades of Statistical Language Modeling: Where Do We Go from Here? Proceedings of IEEE, pp. 1270 1278, 2000. [7].. 5, pp. 1 21, 2010. [8] R. Kuhn and R. De Mori. A cache-based natural language model for speech recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 12, No. 6, pp. 570 583, 2002. [9] R. Rosenfeld. A maximum entropy approach to adaptive statistical language modelling. Computer speech and language, Vol. 10, No. 3, p. 187, 1996. [10] R. Kneser. Statistical language modeling using a variable context length. In Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, Vol. 1, pp. 494 497. IEEE, 2002. [11] C. Chelba and F. Jelinek. Recognition perfor- 7

mance of a structured language model. Arxiv preprint cs/0001022, 2000. [12],,.. 1, pp. 89 94, 2007. [13],,,.., Vol. 50, No. 2, pp. 469 476, 2009. [14].., 2006. [15],... SLP,, Vol. 2005, No. 69, pp. 13 18, 2005-07-15. [16] A. Berger and R. Miller. Just-in-time language modelling. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, Vol. 2, pp. 705 708. IEEE, 2002. [17] D. Vaufreydaz, M. Akbar, and J. Rouillard. Internet documents: a rich source for spoken language modeling. ASRU 99 Workshop, pp. 277 280. Citeseer, 1999. [18] T. Hofmann. Probabilistic latent semantic analysis. Proc. of Uncertainty in Artificial Intelligence, UAI 99, pp. 289 296, 1999. [19] D. Gildea and T. Hofmann. Topic-based language modeling using EM. In Proc. Eurospeech, pp. 2167 2170, 1999. [20],. PLSA.. SLP,, Vol. 2003, No. 124, pp. 67 72, 2003-12-18. [21],,,. PLSA., pp. 233 238, 2006. [22],,,. Web., pp. 57 58, 2010. [23] Gerard Salton. Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1989. [24],,,. Web., pp. 57 58, 2010. [25],. (session- 4, 7 ).. NLC,, Vol. 105, No. 494, pp. 19 24, 2005. [26],,... SLP, Vol. 2007, No. 75, pp. 1 6, 2007. [27] K. Tanigaki, H. Yamamoto, and Y. Sagisaka. A hierarchical language model incorporating classdependent word models for oov words recognition. In The Proceedings of the 6th International Conference on Spoken Language Processing (Volume 3), 2000. [28],,,. Web., Vol. 2008, No. 3, pp. 10 16, 2008. [29],. Wikipedia., Vol. 11,, 2009. 8