Language Processing with Perl and Prolog

Size: px
Start display at page:

Download "Language Processing with Perl and Prolog"

Transcription

1 Language Processing with Perl and Prolog Chapter 5: Counting Words Pierre Nugues Lund University Pierre Nugues Language Processing with Perl and Prolog 1 / 39

2 Counting Words and Word Sequences Words have specific contexts of use. Pairs of words like strong and tea or powerful and computer are not random associations. Psychological linguistics tells us that it is difficult to make a difference between writer and rider without context A listener will discard the improbable rider of books and prefer writer of books A language model is the statistical estimate of a word sequence. Originally developed for speech recognition The language model component enables to predict the next word given a sequence of previous words: the writer of books, novels, poetry, etc. and not the writer of hooks, nobles, poultry,... Pierre Nugues Language Processing with Perl and Prolog 2 / 39

3 Getting the Words from a Text: Tokenization Arrange a list of characters: [l, i, s, t,, o, f,, c, h, a, r, a, c, t, e, r, s] into words: [list, of, characters] Sometimes tricky: Dates: 28/02/96 Numbers: 9, (English), 9 812,345 (French and German) 9.812,345 (Old fashioned French) Abbreviations: km/h, m.p.h., Acronyms: S.N.C.F. Pierre Nugues Language Processing with Perl and Prolog 3 / 39

4 Tokenizing in Perl Language Technology use utf8; binmode(stdout, ":encoding(utf-8)"); binmode(stdin, ":encoding(utf-8)"); $text = <>; while ($line = <>) { $text.= $line; } $text =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿa-zåàâäæçéèêëîïôöœùûüÿ \-,.?!:;/\n/cs; $text =~ s/([,.?!:;])/\n$1\n/g; $text =~ s/\n+/\n/g; print $text; Pierre Nugues Language Processing with Perl and Prolog 4 / 39

5 Improving Tokenization The tokenization algorithm is word-based and defines a content It does not work on nomenclatures such as Item #N23-SW32A, dates, or numbers Instead it is possible to improve it using a boundary-based strategy with spaces (using for instance \s) and punctuation But punctuation signs like commas, dots, or dashes can also be parts of tokens Possible improvements using microgrammars At some point, need of a dictionary: Can t can n t, we ll we ll J aime j aime but aujourd hui Pierre Nugues Language Processing with Perl and Prolog 5 / 39

6 Sentence Segmentation Grefenstette and Tapanainen (1994) used the Brown corpus and experimented increasingly complex rules Most simple rule: a period corresponds to a sentence boundary: 93.20% correctly segmented Recognizing numbers: [0-9]+(\/[0-9]+)+ Fractions, dates ([+\-])?[0-9]+(\.)?[0-9]*% Percent ([0-9]+,?)+(\.[0-9]+ [0-9]+)* Decimal numbers 93.78% correctly segmented Pierre Nugues Language Processing with Perl and Prolog 6 / 39

7 Abbreviations Language Technology Common patterns (Grefenstette and Tapanainen 1994): single capitals: A., B., C., letters and periods: U.S. i.e. m.p.h., capital letter followed by a sequence of consonants: Mr. St. Assn. Regex Correct Errors Full stop [A-Za-z]\. 1, [A-Za-z]\.([A-Za-z0-9]\.) [A-Z][bcdfghj-np-tvxz]+\. 1, Totals 3, Correct segmentation increases to 97.66% With an abbreviation dictionary to 99.07% Pierre Nugues Language Processing with Perl and Prolog 7 / 39

8 N-Grams Language Technology The types are the distinct words of a text while the tokens are all the words or symbols. The phrases from Nineteen Eighty-Four War is peace Freedom is slavery Ignorance is strength have 9 tokens and 7 types. Unigrams are single words Bigrams are sequences of two words Trigrams are sequences of three words Pierre Nugues Language Processing with Perl and Prolog 8 / 39

9 Trigrams Language Technology Word Rank More likely alternatives We 9 The This One Two A Three Please In need 7 are will the would also do to 1 resolve 85 have know do... all 9 the this these problems... of 2 the the 1 important 657 document question first... issues 14 thing point to... within 74 to of and in that... the 1 next 2 company two 5 page exhibit meeting day days 5 weeks years pages months Pierre Nugues Language Processing with Perl and Prolog 9 / 39

10 Counting Words in Perl: Useful Features Useful instructions and features: split, sort, and associative arrays (hash tables, = split(/\n/, $text); $wordcount{"a"} = 21; $wordcount{"and"} = 10; $wordcount{"the"} = 18; keys %wordcount sort array Pierre Nugues Language Processing with Perl and Prolog 10 / 39

11 Counting Words in Perl use utf8; binmode(stdout, ":encoding(utf-8)"); binmode(stdin, ":encoding(utf-8)"); $text = <>; while ($line = <>) { $text.= $line; } $text =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿa-zåàâäæçéèêëîïôöœùûüÿ \-,.?!:;/\n/cs; $text =~ s/([,.?!:;])/\n$1\n/g; $text =~ = split(/\n/, $text); Pierre Nugues Language Processing with Perl and Prolog 11 / 39

12 Counting Words in Perl (Cont d) for ($i = 0; $i <= $#words; $i++) { if (!exists($frequency{$words[$i]})) { $frequency{$words[$i]} = 1; } else { $frequency{$words[$i]}++; } } foreach $word (sort keys %frequency){ print "$frequency{$word} $word\n"; } Pierre Nugues Language Processing with Perl and Prolog 12 / 39

13 Counting Bigrams in = split(/\n/, $text); for ($i = 0; $i < $#words; $i++) { $bigrams[$i] = $words[$i]. " ". $words[$i + 1]; } for ($i = 0; $i < $#words; $i++) { if (!exists($frequency_bigrams{$bigrams[$i]})) { $frequency_bigrams{$bigrams[$i]} = 1; } else { $frequency_bigrams{$bigrams[$i]}++; } } foreach $bigram (sort keys %frequency_bigrams){ print "$frequency_bigrams{$bigram} $bigram \n"; } Pierre Nugues Language Processing with Perl and Prolog 13 / 39

14 Probabilistic Models of a Word Sequence P(S) = P(w 1,...,w n ), = P(w 1 )P(w 2 w 1 )P(w 3 w 1,w 2 )...P(w n w 1,...,w n 1 ), = n P(w i w 1,...,w i 1 ). i=1 The probability P(It was a bright cold day in April) from Nineteen Eighty-Four corresponds to It to begin the sentence, then was knowing that we have It before, then a knowing that we have It was before, and so on until the end of the sentence. P(S) = P(It) P(was It) P(a It,was) P(bright It,was,a)... P(April It, was, a, bright,..., in). Pierre Nugues Language Processing with Perl and Prolog 14 / 39

15 Approximations Language Technology Bigrams: Trigrams: P(w i w 1,w 2,...,w i 1 ) P(w i w i 1 ), P(w i w 1,w 2,...,w i 1 ) P(w i w i 2,w i 1 ). Using a trigram language model, P(S) is approximated as: P(S) P(It) P(was It) P(a It, was) P(bright was, a)... P(April day, in). Pierre Nugues Language Processing with Perl and Prolog 15 / 39

16 Maximum Likelihood Estimate Bigrams: Trigrams: P MLE (w i w i 1 ) = C(w i 1,w i ) C(w i 1,w) = C(w i 1,w i ). C(w i 1 ) w P MLE (w i w i 2,w i 1 ) = C(w i 2,w i 1,w i ). C(w i 2,w i 1 ) Pierre Nugues Language Processing with Perl and Prolog 16 / 39

17 Conditional Probabilities A common mistake in computing the conditional probability P(w i w i 1 ) is to use C(w i 1,w i ) #bigrams. This is not correct. This formula corresponds to P(w i 1,w i ). The correct estimation is Proof: P MLE (w i w i 1 ) = C(w i 1,w i ) C(w i 1,w) = C(w i 1,w i ). C(w i 1 ) w P(w 1,w 2 ) = P(w 1 )P(w 2 w 1 ) = C(w 1) #words C(w 1,w 2 ) = C(w 1,w 2 ) C(w 1 ) #words Pierre Nugues Language Processing with Perl and Prolog 17 / 39

18 Training the Model Language Technology The model is trained on a part of the corpus: the training set It is tested on a different part: the test set The vocabulary can be derived from the corpus, for instance the 20,000 most frequent words, or from a lexicon It can be closed or open A closed vocabulary does not accept any new word An open vocabulary maps the new words, either in the training or test sets, to a specific symbol, <UNK> Pierre Nugues Language Processing with Perl and Prolog 18 / 39

19 Probability of a Sentence: Unigrams <s> A good deal of the literature of the past was, indeed, already being transformed in this way </s> w i C(w i ) #words P MLE (w i ) <s> 7072 a good deal of the literature of the past was indeed already being transformed in this way </s> Pierre Nugues Language Processing with Perl and Prolog 19 / 39

20 Probability of a Sentence: Bigrams <s> A good deal of the literature of the past was, indeed, already being transformed in this way </s> w i 1,w i C(w i 1,w i ) C(w i 1 ) P MLE (w i w i 1 ) <s> a a good good deal deal of of the the literature literature of of the the past past was was indeed indeed already already being being transformed transformed in in this this way way </s> Pierre Nugues Language Processing with Perl and Prolog 20 / 39

21 Sparse Data Language Technology Given a vocabulary of 20,000 types, the potential number of bigrams is 20,000 2 = 400,000,000 With trigrams 20,000 3 = 8,000,000,000,000 Methods: Laplace: add one to all counts Linear interpolation: P DelInterpolation (w n w n 2,w n 1 ) = λ 1 P MLE (w n w n 2 w n 1 )+ λ 2 P MLE (w n w n 1 ) + λ 3 P MLE (w n ), Good-Turing: The discount factor is variable and depends on the number of times a n-gram has occurred in the corpus. Back-off Pierre Nugues Language Processing with Perl and Prolog 21 / 39

22 Laplace s Rule Language Technology P Laplace (w i+1 w i ) = C(w i,w i+1 ) + 1 w (C(w i,w) + 1) = C(w i,w i+1 ) + 1 C(w i ) + Card(V ), w i,w i+1 C(w i,w i+1 ) + 1 C(w i ) + Card(V ) P Lap (w i+1 w i ) <s> a a good good deal deal of of the the literature literature of of the the past past was was indeed indeed already already being being transformed transformed in in this this way way </s> Pierre Nugues Language Processing with Perl and Prolog 22 / 39

23 Good Turing Language Technology Laplace s rule shifts an enormous mass of probability to very unlikely bigrams. Good Turing s estimation is more effective Let s denote N c the number of n-grams that occurred exactly c times in the corpus. N 0 is the number of unseen n-grams, N 1 the number of n-grams seen once,n 2 the number of n-grams seen twice The frequency of n-grams occurring c times is re-estimated as: c = (c + 1) E(N c+1) E(N c ), Unseen n-grams: c = N 1 N 0 and N-grams seen once: c = 2N 2 N 1. Pierre Nugues Language Processing with Perl and Prolog 23 / 39

24 Good-Turing for Nineteen eighty-four Nineteen eighty-four contains 37,365 unique bigrams and 5,820 bigram seen twice. Its vocabulary of 8,635 words generates = 74,563,225 bigrams whose 74,513,701 are unseen. Unseen bigrams: ,365 = , 365 = Unique bigrams: 74,513,701 Freq. of occ. N c c Freq. of occ. N c c 0 74,513, , , , , Pierre Nugues Language Processing with Perl and Prolog 24 / 39

25 Backoff Language Technology If there is no bigram, then use unigrams: { P(w i w i 1 ), if C(w i 1,w i ) 0, P Backoff (w i w i 1 ) = αp(w i ), otherwise. P MLE (w i w i 1 ) = C(w i 1,w i ), if C(w i 1,w i ) 0, P Backoff (w i w i 1 ) = C(w i 1 ) P MLE (w i ) = C(w i) #words, otherwise. Pierre Nugues Language Processing with Perl and Prolog 25 / 39

26 Backoff: Example Language Technology w i 1,w i C(w i 1,w i ) C(w i ) P Backoff (w i w i 1 ) <s> 7072 <s> a a good good deal 0 backoff deal of of the the literature literature of of the the past past was was indeed 0 backoff indeed already 0 backoff already being 0 backoff being transformed 0 backoff transformed in 0 backoff in this this way way </s> The figures we obtain are not probabilities. We can use the Good-Turing technique to discount the bigrams and then scale the unigram probabilities. This is the Katz backoff. Pierre Nugues Language Processing with Perl and Prolog 26 / 39

27 Quality of a Language Model Per word probability of a word sequence: H(L) = 1 n log 2 P(w 1,...,w n ). Entropy rate: H rate = 1 n p(w 1,...,w n )log 2 p(w 1,...,w n ), Cross entropy: We have: H(p,m) = 1 n H(p, m) = lim n 1 n w 1,...,w n L p(w 1,...,w n )log 2 m(w 1,...,w n ). w 1,...,w n L p(w 1,...,w n )log 2 m(w 1,...,w n ), w 1,...,w n L = lim n 1 n log 2 m(w 1,...,w n ). We compute the cross entropy on the complete word sequence of a test set, governed by p, using a bigram or trigram model, m, from a training set. Perplexity: PP(p,m) = 2 H(p,m). Pierre Nugues Language Processing with Perl and Prolog 27 / 39

28 Other Statistical Formulas Mutual information (The strength of an association): P(w i,w j ) I (w i,w j ) = log 2 P(w i )P(w j ) log NC(w i,w j ) 2 C(w i )C(w j ). T-score (The confidence of an association): t(w i,w j ) = mean(p(w i,w j )) mean(p(w i ))mean(p(w j )), σ 2 (P(w i,w j )) + σ 2 (P(w i )P(w j )) C(w i,w j ) 1 N C(w i)c(w j ). C(wi,w j ) Pierre Nugues Language Processing with Perl and Prolog 28 / 39

29 T-Scores with Word set Word Frequency Bigram set + word t-score up 134, a 1,228, to 1,375, off 52, out 12, Source: Bank of English Pierre Nugues Language Processing with Perl and Prolog 29 / 39

30 Mutual Information with Word surgery Word Frequency Bigram word + surgery Mutual info arthroscopic pioneeing reconstructive refractive rhinoplasty Source: Bank of English Pierre Nugues Language Processing with Perl and Prolog 30 / 39

31 Mutual Information and T-Scores in = split(/\n/, $text); for ($i = 0; $i < $#words; $i++) { $bigrams[$i] = $words[$i]. " ". $words[$i + 1]; } for ($i = 0; $i <= $#words; $i++) { $frequency{$words[$i]}++; } for ($i = 0; $i < $#words; $i++) { $frequency_bigrams{$bigrams[$i]}++; } Pierre Nugues Language Processing with Perl and Prolog 31 / 39

32 Mutual Information in Perl for ($i = 0; $i < $#words; $i++) { $mutual_info{$bigrams[$i]} = log(($#words + 1) * $frequency_bigrams{$bigrams[$i]}/ ($frequency{$words[$i]} * $frequency{$words[$i + 1]}))/ log(2); } foreach $bigram (keys = split(/ /, $bigram); print $mutual_info{$bigram}, " ", $bigram, "\t", $frequency_bigrams{$bigram}, "\t", $frequency{$bigram_array[0]}, "\t", $frequency{$bigram_array[1]}, "\n"; } Pierre Nugues Language Processing with Perl and Prolog 32 / 39

33 T-Scores in Perl Language Technology for ($i = 0; $i < $#words; $i++) { $t_scores{$bigrams[$i]} = ($frequency_bigrams{$bigrams[$i]} - $frequency{$words[$i]} * $frequency{$words[$i + 1]}/($#words + 1))/ sqrt($frequency_bigrams{$bigrams[$i]}); } foreach $bigram (keys %t_scores = split(/ /, $bigram); print $t_scores{$bigram}, " ", $bigram, "\t", $frequency_bigrams{$bigram}, "\t", $frequency{$bigram_array[0]}, "\t", $frequency{$bigram_array[1]}, "\n"; } Pierre Nugues Language Processing with Perl and Prolog 33 / 39

34 Information Retrieval: The Vector Space Model The vector space model is a technique to compute the similarity of two documents or to match a document and a query. The vector space model represents a document in word space: Documents w 1 w 2 w 3... w m \Words D 1 C(w 1,D 1 ) C(w 2,D 1 ) C(w 3,D 1 )... C(w m,d 1 ) D 2 C(w 1,D 2 ) C(w 2,D 2 ) C(w 3,D 2 )... C(w m,d 2 )... D n C(w 1,D 1 n) C(w 2,D n ) C(w 3,D n )... C(w m,d n ) We compute the similarity of two documents through their dot product. Pierre Nugues Language Processing with Perl and Prolog 34 / 39

35 The Vector Space Model: Example A collection of two documents D1 and D2 are: D1: Chrysler plans new investments in Latin America. D2: Chrysler plans major investments in Mexico. The vectors representing the two documents: D. america chrysler in investments latin major mexico new plans The vector space model represents documents as bags of words (BOW) that do not take the word order into account. The dot product is D1 D2 = = 4 Their cosine is D1 D2 D1. D2 = = 0.62 Pierre Nugues Language Processing with Perl and Prolog 35 / 39

36 Giving a Weight Language Technology Word clouds give visual weights to words Image: Courtesy of Jonas Wisbrant Pierre Nugues Language Processing with Perl and Prolog 36 / 39

37 TF IDF Language Technology The frequency alone might be misleading Document coordinates are in fact tf idf : Term frequency by inverted document frequency. Term frequency tf i,j : frequency of term j in document i Inverted document frequency: idf j = log( N n j ) Pierre Nugues Language Processing with Perl and Prolog 37 / 39

38 Document Similarity Language Technology Documents are vectors where coordinates could be the count of each word: d = (C(w1 ),C(w 2 ),C(w 3 ),...,C(w n )) The similarity between two documents or a query and a document is given by their cosine: Application: Lucene, Wikipedia n q i d i cos( q, i=1 d) = n n qi 2 i=1 i=1. di 2 Pierre Nugues Language Processing with Perl and Prolog 38 / 39

39 Inverted Index (Source Apple) UserExperience/Conceptual/SearchKitConcepts/index.html Pierre Nugues Language Processing with Perl and Prolog 39 / 39

Natural Language Processing SoSe Words and Language Model

Natural Language Processing SoSe Words and Language Model Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence

More information

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi) Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation

More information

Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing. Statistical Inference: n-grams Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp

More information

Language Models. Philipp Koehn. 11 September 2018

Language Models. Philipp Koehn. 11 September 2018 Language Models Philipp Koehn 11 September 2018 Language models 1 Language models answer the question: How likely is a string of English words good English? Help with reordering p LM (the house is small)

More information

CMPT-825 Natural Language Processing

CMPT-825 Natural Language Processing CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop February 27, 2008 1 / 30 Cross-Entropy and Perplexity Smoothing n-gram Models Add-one Smoothing Additive Smoothing Good-Turing

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a

More information

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016 Language Modelling: Smoothing and Model Complexity COMP-599 Sept 14, 2016 Announcements A1 has been released Due on Wednesday, September 28th Start code for Question 4: Includes some of the package import

More information

Probabilistic Language Modeling

Probabilistic Language Modeling Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November

More information

Language Modeling. Michael Collins, Columbia University

Language Modeling. Michael Collins, Columbia University Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting

More information

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc)

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc) NLP: N-Grams Dan Garrette dhg@cs.utexas.edu December 27, 2013 1 Language Modeling Tasks Language idenfication / Authorship identification Machine Translation Speech recognition Optical character recognition

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

N-gram Language Modeling Tutorial

N-gram Language Modeling Tutorial N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures

More information

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center Spring Lectures 5-6: Language Models and Smoothing. required hard optional

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center Spring Lectures 5-6: Language Models and Smoothing. required hard optional Language Technology CUNY Graduate Center Spring 2013 Unit 1: Sequence Models Lectures 5-6: Language Models and Smoothing required hard optional Professor Liang Huang liang.huang.sh@gmail.com Python Review:

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

CSA4050: Advanced Topics Natural Language Processing. Lecture Statistics III. Statistical Approaches to NLP

CSA4050: Advanced Topics Natural Language Processing. Lecture Statistics III. Statistical Approaches to NLP University of Malta BSc IT (Hons)Year IV CSA4050: Advanced Topics Natural Language Processing Lecture Statistics III Statistical Approaches to NLP Witten-Bell Discounting Unigrams Bigrams Dept Computer

More information

CS 224N HW:#3. (V N0 )δ N r p r + N 0. N r (r δ) + (V N 0)δ. N r r δ. + (V N 0)δ N = 1. 1 we must have the restriction: δ NN 0.

CS 224N HW:#3. (V N0 )δ N r p r + N 0. N r (r δ) + (V N 0)δ. N r r δ. + (V N 0)δ N = 1. 1 we must have the restriction: δ NN 0. CS 224 HW:#3 ARIA HAGHIGHI SUID :# 05041774 1. Smoothing Probability Models (a). Let r be the number of words with r counts and p r be the probability for a word with r counts in the Absolute discounting

More information

perplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3)

perplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3) Chapter 5 Language Modeling 5.1 Introduction A language model is simply a model of what strings (of words) are more or less likely to be generated by a speaker of English (or some other language). More

More information

Language Model. Introduction to N-grams

Language Model. Introduction to N-grams Language Model Introduction to N-grams Probabilistic Language Model Goal: assign a probability to a sentence Application: Machine Translation P(high winds tonight) > P(large winds tonight) Spelling Correction

More information

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

Language Models. Hongning Wang

Language Models. Hongning Wang Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity

More information

Chapter 3: Basics of Language Modelling

Chapter 3: Basics of Language Modelling Chapter 3: Basics of Language Modelling Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple

More information

Week 13: Language Modeling II Smoothing in Language Modeling. Irina Sergienya

Week 13: Language Modeling II Smoothing in Language Modeling. Irina Sergienya Week 13: Language Modeling II Smoothing in Language Modeling Irina Sergienya 07.07.2015 Couple of words first... There are much more smoothing techniques, [e.g. Katz back-off, Jelinek-Mercer,...] and techniques

More information

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider

More information

Language Processing with Perl and Prolog

Language Processing with Perl and Prolog Language Processing with Perl and Prolog es Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and Prolog 1 / 12 Training

More information

Statistical Natural Language Processing

Statistical Natural Language Processing Statistical Natural Language Processing N-gram Language Models Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2017 N-gram language models A language model answers

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Language Models, Graphical Models Sameer Maskey Week 13, April 13, 2010 Some slides provided by Stanley Chen and from Bishop Book Resources 1 Announcements Final Project Due,

More information

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer + Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward

More information

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview The Language Modeling Problem We have some (finite) vocabulary, say V = {the, a, man, telescope, Beckham, two, } 6.864 (Fall 2007) Smoothed Estimation, and Language Modeling We have an (infinite) set of

More information

{ Jurafsky & Martin Ch. 6:! 6.6 incl.

{ Jurafsky & Martin Ch. 6:! 6.6 incl. N-grams Now Simple (Unsmoothed) N-grams Smoothing { Add-one Smoothing { Backo { Deleted Interpolation Reading: { Jurafsky & Martin Ch. 6:! 6.6 incl. 1 Word-prediction Applications Augmentative Communication

More information

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =

More information

Lecture 1b: Text, terms, and bags of words

Lecture 1b: Text, terms, and bags of words Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Outline Probabilistic language

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

ANLP Lecture 6 N-gram models and smoothing

ANLP Lecture 6 N-gram models and smoothing ANLP Lecture 6 N-gram models and smoothing Sharon Goldwater (some slides from Philipp Koehn) 27 September 2018 Sharon Goldwater ANLP Lecture 6 27 September 2018 Recap: N-gram models We can model sentence

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 12: Language Models for IR Outline Language models Language Models for IR Discussion What is a language model? We can view a finite state automaton as a deterministic

More information

Lecture 2: N-gram. Kai-Wei Chang University of Virginia Couse webpage:

Lecture 2: N-gram. Kai-Wei Chang University of Virginia Couse webpage: Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS 6501: Natural Language Processing 1 This lecture Language Models What are

More information

Ranked Retrieval (2)

Ranked Retrieval (2) Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF

More information

University of Illinois at Urbana-Champaign. Midterm Examination

University of Illinois at Urbana-Champaign. Midterm Examination University of Illinois at Urbana-Champaign Midterm Examination CS410 Introduction to Text Information Systems Professor ChengXiang Zhai TA: Azadeh Shakery Time: 2:00 3:15pm, Mar. 14, 2007 Place: Room 1105,

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of

More information

Chapter 3: Basics of Language Modeling

Chapter 3: Basics of Language Modeling Chapter 3: Basics of Language Modeling Section 3.1. Language Modeling in Automatic Speech Recognition (ASR) All graphs in this section are from the book by Schukat-Talamazzini unless indicated otherwise

More information

N-gram Language Model. Language Models. Outline. Language Model Evaluation. Given a text w = w 1...,w t,...,w w we can compute its probability by:

N-gram Language Model. Language Models. Outline. Language Model Evaluation. Given a text w = w 1...,w t,...,w w we can compute its probability by: N-gram Language Model 2 Given a text w = w 1...,w t,...,w w we can compute its probability by: Language Models Marcello Federico FBK-irst Trento, Italy 2016 w Y Pr(w) =Pr(w 1 ) Pr(w t h t ) (1) t=2 where

More information

SYNTHER A NEW M-GRAM POS TAGGER

SYNTHER A NEW M-GRAM POS TAGGER SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de

More information

An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling

An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling Masaharu Kato, Tetsuo Kosaka, Akinori Ito and Shozo Makino Abstract Topic-based stochastic models such as the probabilistic

More information

TnT Part of Speech Tagger

TnT Part of Speech Tagger TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation

More information

Graphical Models. Mark Gales. Lent Machine Learning for Language Processing: Lecture 3. MPhil in Advanced Computer Science

Graphical Models. Mark Gales. Lent Machine Learning for Language Processing: Lecture 3. MPhil in Advanced Computer Science Graphical Models Mark Gales Lent 2011 Machine Learning for Language Processing: Lecture 3 MPhil in Advanced Computer Science MPhil in Advanced Computer Science Graphical Models Graphical models have their

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Language models Based on slides from Michael Collins, Chris Manning and Richard Soccer Plan Problem definition Trigram models Evaluation Estimation Interpolation Discounting

More information

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduc*on to N- grams Many Slides are adapted from slides by Dan Jurafsky Probabilis1c Language Models Today s goal: assign a probability to a sentence Machine Transla*on: Why? P(high

More information

Language Modeling. Introduction to N-grams. Klinton Bicknell. (borrowing from: Dan Jurafsky and Jim Martin)

Language Modeling. Introduction to N-grams. Klinton Bicknell. (borrowing from: Dan Jurafsky and Jim Martin) Language Modeling Introduction to N-grams Klinton Bicknell (borrowing from: Dan Jurafsky and Jim Martin) Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation:

More information

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address

More information

Empirical Methods in Natural Language Processing Lecture 5 N-gram Language Models

Empirical Methods in Natural Language Processing Lecture 5 N-gram Language Models Empirical Methods in Natural Language Processing Lecture 5 N-gram Language Models (most slides from Sharon Goldwater; some adapted from Alex Lascarides) 29 January 2017 Nathan Schneider ENLP Lecture 5

More information

arxiv:cmp-lg/ v1 9 Jun 1997

arxiv:cmp-lg/ v1 9 Jun 1997 arxiv:cmp-lg/9706007 v1 9 Jun 1997 Aggregate and mixed-order Markov models for statistical language processing Abstract We consider the use of language models whose size and accuracy are intermediate between

More information

Natural Language Processing (CSE 490U): Language Models

Natural Language Processing (CSE 490U): Language Models Natural Language Processing (CSE 490U): Language Models Noah Smith c 2017 University of Washington nasmith@cs.washington.edu January 6 9, 2017 1 / 67 Very Quick Review of Probability Event space (e.g.,

More information

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.

More information

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1 Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

More information

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze Chapter 10: Information Retrieval See corresponding chapter in Manning&Schütze Evaluation Metrics in IR 2 Goal In IR there is a much larger variety of possible metrics For different tasks, different metrics

More information

Text Mining. March 3, March 3, / 49

Text Mining. March 3, March 3, / 49 Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49

More information

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 20, 2017 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are

More information

N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition

N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition 2010 11 5 N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition 1 48-106413 Abstract Large-Vocabulary Continuous Speech Recognition(LVCSR) system has rapidly been growing today.

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a

More information

Part-of-Speech Tagging

Part-of-Speech Tagging Part-of-Speech Tagging Informatics 2A: Lecture 17 Adam Lopez School of Informatics University of Edinburgh 27 October 2016 1 / 46 Last class We discussed the POS tag lexicon When do words belong to the

More information

GloVe: Global Vectors for Word Representation 1

GloVe: Global Vectors for Word Representation 1 GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background

More information

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. Text mining What can be used for text mining?? Classification/categorization

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP

More information

Phrase Finding; Stream-and-Sort vs Request-and-Answer. William W. Cohen

Phrase Finding; Stream-and-Sort vs Request-and-Answer. William W. Cohen Phrase Finding; Stream-and-Sort vs Request-and-Answer William W. Cohen Announcement Two-day extension for all on HW1B: now due Thursday 1/29 Correction Using Large-vocabulary Naïve Bayes For each example

More information

Kneser-Ney smoothing explained

Kneser-Ney smoothing explained foldl home blog contact feed Kneser-Ney smoothing explained 18 January 2014 Language models are an essential element of natural language processing, central to tasks ranging from spellchecking to machine

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Sequences and Information

Sequences and Information Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols

More information

More Smoothing, Tuning, and Evaluation

More Smoothing, Tuning, and Evaluation More Smoothing, Tuning, and Evaluation Nathan Schneider (slides adapted from Henry Thompson, Alex Lascarides, Chris Dyer, Noah Smith, et al.) ENLP 21 September 2016 1 Review: 2 Naïve Bayes Classifier w

More information

Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints

Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints Brian Roark, Google NYU CSCI-GA.2585 Speech Recognition, Weds., April 15, 2015 (Don t forget to pay your taxes!)

More information

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying

More information

Exploring Asymmetric Clustering for Statistical Language Modeling

Exploring Asymmetric Clustering for Statistical Language Modeling Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL, Philadelphia, July 2002, pp. 83-90. Exploring Asymmetric Clustering for Statistical Language Modeling Jianfeng

More information

Probabilistic Spelling Correction CE-324: Modern Information Retrieval Sharif University of Technology

Probabilistic Spelling Correction CE-324: Modern Information Retrieval Sharif University of Technology Probabilistic Spelling Correction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan lectures

More information

NATURAL LANGUAGE PROCESSING. Dr. G. Bharadwaja Kumar

NATURAL LANGUAGE PROCESSING. Dr. G. Bharadwaja Kumar NATURAL LANGUAGE PROCESSING Dr. G. Bharadwaja Kumar Sentence Boundary Markers Many natural language processing (NLP) systems generally take a sentence as an input unit part of speech (POS) tagging, chunking,

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

HMM and Part of Speech Tagging. Adam Meyers New York University

HMM and Part of Speech Tagging. Adam Meyers New York University HMM and Part of Speech Tagging Adam Meyers New York University Outline Parts of Speech Tagsets Rule-based POS Tagging HMM POS Tagging Transformation-based POS Tagging Part of Speech Tags Standards There

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS

More information

Statistical methods in NLP, lecture 7 Tagging and parsing

Statistical methods in NLP, lecture 7 Tagging and parsing Statistical methods in NLP, lecture 7 Tagging and parsing Richard Johansson February 25, 2014 overview of today's lecture HMM tagging recap assignment 3 PCFG recap dependency parsing VG assignment 1 overview

More information

CS4442/9542b Artificial Intelligence II prof. Olga Veksler

CS4442/9542b Artificial Intelligence II prof. Olga Veksler CS4442/9542b Artificial Intelligence II prof. Olga Veksler Lecture 14 Natural Language Processing Language Models Many slides from: Joshua Goodman, L. Kosseim, D. Klein, D. Jurafsky Outline Why we need

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Introduction to Probablistic Natural Language Processing

Introduction to Probablistic Natural Language Processing Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information