N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition

Size: px

Start display at page:

Download "N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition"

Cory Hutchinson
5 years ago
Views:

1 N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition Abstract Large-Vocabulary Continuous Speech Recognition(LVCSR) system has rapidly been growing today. The purpose of this system is automating the speech transcript of lecture or news etc. Speech recognition system has two core parts; acoustic model and language model. I focus on the language model. Statistical language models are widely used as language model for LVCSR. Above all, a N-gram language model is a de facto standard as a language model for speech recognition. N-gram is the most excellent model of any other language models, but there are Fig.1 Overview of Speech Recognition some problems. In this paper, I introduce three major issues of N-gram model, and overview recent research trends of language model as the solutions of these problems. 1 (Large-Vocabulary Continuous Speech Recognition: LVCSR) [1] (Acoustic Model) (Language Model) 2 Fig.1 [2] [3] (N 1) N-gram [4] N-gram [5, 6] N-gram N-gram 2 N- gram N-gram N-gram 2.1 N-gram N-gram [4] N-gram (N 1) 1

2 Fig.2 An Example of Calculating Trigram Probability Fig.3 Back-off Smoothing N-gram (N 1) N-gram w i i n+1 w i w b a w 1, w 2,..., w k a b P (w i w i 1 1 ) P (w i w i 1 i n+1 ) (1) N = 1, 2, 3 unigram bigram trigram P (w i w i 1 1 ) N (N-1) P (w i w i 1 i n+1 ) = C(wi i n+1 ) C(w i 1 i n+1 ) (2) C(w i 1) w i 1 N-gram N N = 2 3 bigram trigram [7] trigram Fig back-off N-gram N N-gram (2) 0 N-gram back-off N-gram N-gram [7] back-off Fig.3 Fig.3 N N-1 trigram bigram 2.3 N-gram N-gram

3 3 N-gram (N 1) trigram 2 [8] [9] N-gram[10] Structured Language Model[11] 3.1 [8] Fig.4 (2004 ) N-gram w n H = {w n H,, w n 1 } w n P c (w n H) P c (w n H) = 1 H w h H δ(w n, w h ) (3) H H δ { 1 (i = j) δ(i, j) = 0 (i j) (4) Kuhn [8] N-gram N-gram Fig.4 Probability of Occurrence of The Same Word Again [12] [13] N-gram (3) H H N- gram [9] Fig.5 ( 2004 ) w a w b 3

4 1 N-gram [14] [15] Fig.5 Probability of Co-occurrence of Two Words (trigger-pair) w a w b w w (self-trigger) P T (w d) exp( i λ i f i (d)) (5) { 1, if d w f i (d) = 0, other (6) (5) (ME )[3] d w f i (d) d w λ i [9] [3] MI(w a ; w b ) = P (w a, w b )log P (w b w a ) P (w b ) +P (w a, w b )log P (w b w a ) P (w b ) +P (w a, w b )log P (w b w a ) P (w b ) +P (w a, w b )log P (w b w a ) P (w b ) (7) k 4 N-gram (Language Model Adaptation) [16, 17] (Topic Adaptation) (Speech Style Adaptation) 4.2 4

1. 2. 2 1. (PLSA)[18] 2. World Wide Web (WWW) 4.2.1 (PLSA) (Probabilistic Latent Semantic Analysis: PLSA)[18] PLSA d w z P (w d) = z P (w z)p (z d) (8) PLSA EM bigram trigram N-gram unigram [19] P (w

5 (PLSA)[18] 2. World Wide Web (WWW) (PLSA) (Probabilistic Latent Semantic Analysis: PLSA)[18] PLSA d w z P (w d) = z P (w z)p (z d) (8) PLSA EM bigram trigram N-gram unigram [19] P (w i w i 1 w i 2 ) P (w i w i 1 w i 2 ) P (w i d) P (w i ) P (w i w i 1 w i 2 ) (9) trigram trigram Fig.6 Unsupervised Language Model Adaptation by Using Web trigram PLSA [20] 2 PLSA [21] PLSA Web World Wide Web Web [13] Web [22] 2 Web Web

6 3. 4. Web 5. Web Fig Web tf-idf[23] tf-idf tf-idf tf Term Frequency: idf Inverse Document Frequency: 2 tfidf = tf idf (10) tf i = n i k n k D idf i = log {d : d t i } (11) (12) n i i D {d : d t i } i idf tf-idf [12] tf-idf 3 Web [24] tf-idf Web Web Fig.7 An Example of tagged transcript 4.3 [25] Fig.7 [26] N-gram 5 N-gram N-gram UNK P (UNK w i 2 w i 1 ) 6

Web Web Web Fig.8 A Hierarchical Language Model UNK N-gram N-gram [27] Fig.8 Wikipedia Web [28] N-gram Wikipedia [29] 6 N-gram 3 N-gram N-gram [1],,,,.., 2001. [2]. ( )., Vol. 72, No. 8, pp.

7 Web Web Web Fig.8 A Hierarchical Language Model UNK N-gram N-gram [27] Fig.8 Wikipedia Web [28] N-gram Wikipedia [29] 6 N-gram 3 N-gram N-gram [1],,,,.., [2]. ( )., Vol. 72, No. 8, pp , [3].., [4] Shannon C. E. A mathematical theory of communication. Bell System Technical Journal, pp , [5]., Vol. 2008, No. 68, pp , [6] R Rosenfeld. Two Decades of Statistical Language Modeling: Where Do We Go from Here? Proceedings of IEEE, pp , [7].. 5, pp. 1 21, [8] R. Kuhn and R. De Mori. A cache-based natural language model for speech recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 12, No. 6, pp , [9] R. Rosenfeld. A maximum entropy approach to adaptive statistical language modelling. Computer speech and language, Vol. 10, No. 3, p. 187, [10] R. Kneser. Statistical language modeling using a variable context length. In Spoken Language, ICSLP 96. Proceedings., Fourth International Conference on, Vol. 1, pp IEEE, [11] C. Chelba and F. Jelinek. Recognition perfor- 7

8 mance of a structured language model. Arxiv preprint cs/ , [12],,.. 1, pp , [13],,,.., Vol. 50, No. 2, pp , [14].., [15],... SLP,, Vol. 2005, No. 69, pp , [16] A. Berger and R. Miller. Just-in-time language modelling. In Acoustics, Speech and Signal Processing, Proceedings of the 1998 IEEE International Conference on, Vol. 2, pp IEEE, [17] D. Vaufreydaz, M. Akbar, and J. Rouillard. Internet documents: a rich source for spoken language modeling. ASRU 99 Workshop, pp Citeseer, [18] T. Hofmann. Probabilistic latent semantic analysis. Proc. of Uncertainty in Artificial Intelligence, UAI 99, pp , [19] D. Gildea and T. Hofmann. Topic-based language modeling using EM. In Proc. Eurospeech, pp , [20],. PLSA.. SLP,, Vol. 2003, No. 124, pp , [21],,,. PLSA., pp , [22],,,. Web., pp , [23] Gerard Salton. Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, [24],,,. Web., pp , [25],. (session- 4, 7 ).. NLC,, Vol. 105, No. 494, pp , [26],,... SLP, Vol. 2007, No. 75, pp. 1 6, [27] K. Tanigaki, H. Yamamoto, and Y. Sagisaka. A hierarchical language model incorporating classdependent word models for oov words recognition. In The Proceedings of the 6th International Conference on Spoken Language Processing (Volume 3), [28],,,. Web., Vol. 2008, No. 3, pp , [29],. Wikipedia., Vol. 11,,

An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling

An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling Masaharu Kato, Tetsuo Kosaka, Akinori Ito and Shozo Makino Abstract Topic-based stochastic models such as the probabilistic