Chapter 3: Basics of Language Modeling

Size: px

Start display at page:

Download "Chapter 3: Basics of Language Modeling"

Angelica Cole
5 years ago
Views:

1 Chapter 3: Basics of Language Modeling

2 Section 3.1. Language Modeling in Automatic Speech Recognition (ASR) All graphs in this section are from the book by Schukat-Talamazzini unless indicated otherwise 2 2

3 Complexity of Speech Recognition Applications 3 3 From: Schukat-Talamazzini: Spracherkennung

4 What makes Speech Recognition difficult 4 4 From: Schukat-Talamazzini: Spracherkennung

5 Milestones in ASR 5 5 From: Schukat-Talamazzini: Spracherkennung

6 Introduction into ASR 6 6 Source: Steve Young: the HTK book

7 Basic Components of an ASR-System Speech input Stream of feature vectors A Feature Extraction Search h ^ W=argmax [P(A W) P(W)] [W] Acoustic Model P(A W) LanguageModel P(W) Recognised Word Sequence ^ W 7 7

8 The Markov Assumption Cutting history to M-1 words: Special cases: Trigram P(wi w1, w2,... wi 1) P(wi wi M 1,... wi 1) P(wi w1, w2,... wi 1) P(wi wi 2, wi 1) Bigram P(wi w1, w2,... w i 1) P(wi wi 1) Unigram P(w i w, w 1 2,... w i 1 ) P(w ) i Zerogram (uniform distribution) P(wi w1, w2,... w i 1) 1 W 8 8

9 Section 3.2. A Measure for LM Quality: Preplexity 9 9

10 Motivation Speech recognition can be computationally expensive For research and development: need a simple evaluation metric for fast turn-around times in experiments 10 10

11 Test your Language Model What s in your hometown newspaper??? 11 11

12 Test your Language Model What s in your hometown newspaper today 12 12

13 Test your Language Model It s raining cats and??? 13 13

14 Test your Language Model It s raining cats and dogs 14 14

15 Test your Language Model President Bill??? 15 15

16 Test your Language Model President Bill Gates 16 16

17 Definition of Perplexity Let w 1, w 2, w N be an independent test corpus not used during training. Perplexity PP P( w -1/N 1, w2,... wn ) Normalized probability of test corpus 17 17

18 Example: Zerogram Language Model Zerogram P(wi w1, w2,... w i 1) (uniform distribution) 1 W PP W N i1 P( w 1 W 1, w 2 1/ N,... w N ) -1/N Interpretation: perplexity is the average de-facto size of vocabulary 18 18

19 Alternate Formulation Assumption: Use an M-gram language model History h i =w i-m+1 w i-1 Idea: collapse all identical histories 19 19

20 Alternate Formulation Example: M=2 Corpus: to be or not to be h 2 = to w 2 = be h 6 = to w 6 = be a calculate P( be to ) only once and scale it by

21 Alternate Formulation PP P( w N P( w i h i ) i1 exp log -1/N 1, w2,... wn ) N i1 1/ N P( w i h i ) 1/ N Use Bayes decomposition Try to get rid of product exp 1 log N N i1 P( w i h i ) 21 21

22 Alternate Formulation exp 1 log N N i1 P( w i h i ) exp N 1 logp( w i h i ) N 1 i exp 1 N( w, h)logp( w h) N, w h N(w,h): absolute frequency of sequence h,w on test corpus exp w, h f ( w, h)log P( w h) f(w,h) relative frequency of sequence h,w on test corpus 22 22

23 Alternate Formulation: Final Result PP exp P( w... w 1 w, h f N ) 1/ N ( w, h)log P( w h) 23 23

24 Alternate Formulation: Zerogram Example Zerogram (uniform distribution) P(wi w1, w2,... w i 1) 1 W PP exp w, h f ( w, h)log 1 exp log f ( w, h) W w, h P( w h) exp w, h 1 f ( w, h)log W log W * W exp 1 Identical result 24 24

25 Perplexity and Error Rate Perplexity is correlated to word error rate Power law relation Klakow + Peters in Speech Communication

26 Quality Measure: Mean Rank Definition: Let w follow h in the test text Sort all words after a given history according to p(w_i h) Determine position of correct word w Avarage over all events in the test text 26 26

27 Alternative: Average Rank Source: Witten, Bell: Text Compression

28 Quality Measure: Mean Rank Measure Perplexity Mean Rank Correlation

29 Quality Measure: worst Score Which fraction of the test text have a score worse than S c Correlation:

30 Summary Speech recognition Basic task Language model as a component Perplexity: Measures quality of language model 30 30

Chapter 3: Basics of Language Modelling

Chapter 3: Basics of Language Modelling Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple