Chapter 3: Basics of Language Modelling

Size: px

Start display at page:

Download "Chapter 3: Basics of Language Modelling"

Norma Pope
5 years ago
Views:

1 Chapter 3: Basics of Language Modelling

2 Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple evaluation metric for fast turnaround times in experiments 2 2

3 Test your Language Model What s in your hometown newspaper??? 3 3

4 Test your Language Model What s in your hometown newspaper today 4 4

5 Test your Language Model It s raining cats and??? 5 5

6 Test your Language Model It s raining cats and dogs 6 6

7 Test your Language Model President Bill??? 7 7

8 Test your Language Model President Bill Gates 8 8

9 Definition Language Model A language model either: Predicts the next word w given a sequence of predecessor words h (history) P(w i hi ) P(w i w1,... wi 1) with i : position in text or Predicts the probability of a sequence of words P( w, w,... w,. w ) P( W 1 2 N 1 N ) 9 9

10 Language Model in Speech Recognition (ASR) Speech input Stream of feature vectors A Feature Extraction Search ^ W=argmax [P(A W) P(W)] [W] Acoustic Model P(A W) LanguageModel P(W) Recognised Word Sequence ^ W 10 10

11 Language Model in Machine Translation (MT) Source Sentence W F LexiconModel P(W F W E ) Search ^ W E =argmax [P(W F W E ) P(W E )] [W E ] LanguageModel P(W) E ) Recognised Word Sequence ^ W E 11 11

12 Sentence Completion Directly use of P(w h) Needs good search algorithms 12 12

13 The Markov Assumption: truncate the history Cutting history to M-1 words: Special cases: P(wi w1, w2,... wi 1) P(wi wi M 1,... wi 1) Zerogram (uniform distribution) P(wi w1, w2,... w i 1) 1 W Unigram P(w i w, w 1 2,... w i 1 ) P(w ) i Bigram Trigram P(wi w1, w2,... w i 1) P(wi wi 1) P(wi w1, w2,... wi 1) P(wi wi 2, wi 1)

14 How would you measure the quality of an LM? 14 14

15 Definition of Perplexity Let w 1, w 2, w N be an independent test corpus not used during training. Perplexity PP P( w -1/N 1, w2,... wn ) Interpretation: Normalized probability of test corpus 15 15

16 Example: Zerogram Language Model Zerogram P(wi w1, w2,... w i 1) (uniform distribution) 1 W PP W N i1 P( w 1 W 1, w 2 1/ N,... w N ) -1/N Interpretation: perplexity is the average de-facto size of vocabulary 16 16

17 Alternate Formulation Assumption: Use an M-gram language model History h i =w i-m+1 w i-1 Idea: collapse all identical histories 17 17

18 Alternate Formulation Example: M=2 Corpus: to be or not to be h 2 = to w 2 = be h 6 = to w 6 = be a calculate P( be to ) only once and scale it by

19 Alternate Formulation PP P( w N P( w i h i ) i1 exp log -1/N 1, w2,... wn ) N i1 1/ N P( w i h i ) 1/ N Use Bayes decomposition Try to get rid of product exp 1 log N N i1 P( w i h i ) 19 19

20 Alternate Formulation exp 1 log N N i1 P( w i h i ) exp N 1 logp( w i h i ) N 1 i exp 1 N( w, h)logp( w h) N, w h N(w,h): absolute frequency of sequence h,w on test corpus exp w, h f ( w, h)log P( w h) f(w,h) relative frequency of sequence h,w on test corpus 20 20

21 Alternate Formulation: Final Result PP P( w exp 1 w, h... w f N ) 1/ N ( w, h)log P( w h) Perplexity can also be calculated using conditional probabilities trained in the training corpus and relative frequencies from the test corpus

22 Alternate Formulation: Zerogram Example Zerogram (uniform distribution) P(wi w1, w2,... w i 1) 1 W PP exp w, h f ( w, h)log 1 exp log f ( w, h) W w, h P( w h) exp w, h 1 f ( w, h)log W log W * W exp 1 Identical result 22 22

23 Perplexity and Error Rate Perplexity is correlated to word error rate Power law relation Klakow + Peters in Speech Communication

24 Quality Measure: Mean Rank Definition: Let w follow h in the test text Sort all words after a given history according to p(w h) Determine position of correct word w Average over all events in the test text 24 24

25 Alternative: Average Rank Task: guess the next word Metric: Measure how many attempts it takes Source: Witten, Bell: Text Compression

26 Quality Measure: Mean Rank Measure Perplexity Mean Rank Correlation with ASR error rate Mean rank rank equally good in predicting ASR performance 26 26

27 Summary Language models predict the next word Applications: Speech Recognition Machine Translation Natural Language Generation Information retrieval Sentence Completion Perplexity: Measures quality of language model 27 27

Chapter 3: Basics of Language Modeling

Chapter 3: Basics of Language Modeling Section 3.1. Language Modeling in Automatic Speech Recognition (ASR) All graphs in this section are from the book by Schukat-Talamazzini unless indicated otherwise