Language Processing with Perl and Prolog
|
|
- Aleesha McKenzie
- 6 years ago
- Views:
Transcription
1 Language Processing with Perl and Prolog Chapter 5: Counting Words Pierre Nugues Lund University Pierre Nugues Language Processing with Perl and Prolog 1 / 39
2 Counting Words and Word Sequences Words have specific contexts of use. Pairs of words like strong and tea or powerful and computer are not random associations. Psychological linguistics tells us that it is difficult to make a difference between writer and rider without context A listener will discard the improbable rider of books and prefer writer of books A language model is the statistical estimate of a word sequence. Originally developed for speech recognition The language model component enables to predict the next word given a sequence of previous words: the writer of books, novels, poetry, etc. and not the writer of hooks, nobles, poultry,... Pierre Nugues Language Processing with Perl and Prolog 2 / 39
3 Getting the Words from a Text: Tokenization Arrange a list of characters: [l, i, s, t,, o, f,, c, h, a, r, a, c, t, e, r, s] into words: [list, of, characters] Sometimes tricky: Dates: 28/02/96 Numbers: 9, (English), 9 812,345 (French and German) 9.812,345 (Old fashioned French) Abbreviations: km/h, m.p.h., Acronyms: S.N.C.F. Pierre Nugues Language Processing with Perl and Prolog 3 / 39
4 Tokenizing in Perl Language Technology use utf8; binmode(stdout, ":encoding(utf-8)"); binmode(stdin, ":encoding(utf-8)"); $text = <>; while ($line = <>) { $text.= $line; } $text =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿa-zåàâäæçéèêëîïôöœùûüÿ \-,.?!:;/\n/cs; $text =~ s/([,.?!:;])/\n$1\n/g; $text =~ s/\n+/\n/g; print $text; Pierre Nugues Language Processing with Perl and Prolog 4 / 39
5 Improving Tokenization The tokenization algorithm is word-based and defines a content It does not work on nomenclatures such as Item #N23-SW32A, dates, or numbers Instead it is possible to improve it using a boundary-based strategy with spaces (using for instance \s) and punctuation But punctuation signs like commas, dots, or dashes can also be parts of tokens Possible improvements using microgrammars At some point, need of a dictionary: Can t can n t, we ll we ll J aime j aime but aujourd hui Pierre Nugues Language Processing with Perl and Prolog 5 / 39
6 Sentence Segmentation Grefenstette and Tapanainen (1994) used the Brown corpus and experimented increasingly complex rules Most simple rule: a period corresponds to a sentence boundary: 93.20% correctly segmented Recognizing numbers: [0-9]+(\/[0-9]+)+ Fractions, dates ([+\-])?[0-9]+(\.)?[0-9]*% Percent ([0-9]+,?)+(\.[0-9]+ [0-9]+)* Decimal numbers 93.78% correctly segmented Pierre Nugues Language Processing with Perl and Prolog 6 / 39
7 Abbreviations Language Technology Common patterns (Grefenstette and Tapanainen 1994): single capitals: A., B., C., letters and periods: U.S. i.e. m.p.h., capital letter followed by a sequence of consonants: Mr. St. Assn. Regex Correct Errors Full stop [A-Za-z]\. 1, [A-Za-z]\.([A-Za-z0-9]\.) [A-Z][bcdfghj-np-tvxz]+\. 1, Totals 3, Correct segmentation increases to 97.66% With an abbreviation dictionary to 99.07% Pierre Nugues Language Processing with Perl and Prolog 7 / 39
8 N-Grams Language Technology The types are the distinct words of a text while the tokens are all the words or symbols. The phrases from Nineteen Eighty-Four War is peace Freedom is slavery Ignorance is strength have 9 tokens and 7 types. Unigrams are single words Bigrams are sequences of two words Trigrams are sequences of three words Pierre Nugues Language Processing with Perl and Prolog 8 / 39
9 Trigrams Language Technology Word Rank More likely alternatives We 9 The This One Two A Three Please In need 7 are will the would also do to 1 resolve 85 have know do... all 9 the this these problems... of 2 the the 1 important 657 document question first... issues 14 thing point to... within 74 to of and in that... the 1 next 2 company two 5 page exhibit meeting day days 5 weeks years pages months Pierre Nugues Language Processing with Perl and Prolog 9 / 39
10 Counting Words in Perl: Useful Features Useful instructions and features: split, sort, and associative arrays (hash tables, = split(/\n/, $text); $wordcount{"a"} = 21; $wordcount{"and"} = 10; $wordcount{"the"} = 18; keys %wordcount sort array Pierre Nugues Language Processing with Perl and Prolog 10 / 39
11 Counting Words in Perl use utf8; binmode(stdout, ":encoding(utf-8)"); binmode(stdin, ":encoding(utf-8)"); $text = <>; while ($line = <>) { $text.= $line; } $text =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿa-zåàâäæçéèêëîïôöœùûüÿ \-,.?!:;/\n/cs; $text =~ s/([,.?!:;])/\n$1\n/g; $text =~ = split(/\n/, $text); Pierre Nugues Language Processing with Perl and Prolog 11 / 39
12 Counting Words in Perl (Cont d) for ($i = 0; $i <= $#words; $i++) { if (!exists($frequency{$words[$i]})) { $frequency{$words[$i]} = 1; } else { $frequency{$words[$i]}++; } } foreach $word (sort keys %frequency){ print "$frequency{$word} $word\n"; } Pierre Nugues Language Processing with Perl and Prolog 12 / 39
13 Counting Bigrams in = split(/\n/, $text); for ($i = 0; $i < $#words; $i++) { $bigrams[$i] = $words[$i]. " ". $words[$i + 1]; } for ($i = 0; $i < $#words; $i++) { if (!exists($frequency_bigrams{$bigrams[$i]})) { $frequency_bigrams{$bigrams[$i]} = 1; } else { $frequency_bigrams{$bigrams[$i]}++; } } foreach $bigram (sort keys %frequency_bigrams){ print "$frequency_bigrams{$bigram} $bigram \n"; } Pierre Nugues Language Processing with Perl and Prolog 13 / 39
14 Probabilistic Models of a Word Sequence P(S) = P(w 1,...,w n ), = P(w 1 )P(w 2 w 1 )P(w 3 w 1,w 2 )...P(w n w 1,...,w n 1 ), = n P(w i w 1,...,w i 1 ). i=1 The probability P(It was a bright cold day in April) from Nineteen Eighty-Four corresponds to It to begin the sentence, then was knowing that we have It before, then a knowing that we have It was before, and so on until the end of the sentence. P(S) = P(It) P(was It) P(a It,was) P(bright It,was,a)... P(April It, was, a, bright,..., in). Pierre Nugues Language Processing with Perl and Prolog 14 / 39
15 Approximations Language Technology Bigrams: Trigrams: P(w i w 1,w 2,...,w i 1 ) P(w i w i 1 ), P(w i w 1,w 2,...,w i 1 ) P(w i w i 2,w i 1 ). Using a trigram language model, P(S) is approximated as: P(S) P(It) P(was It) P(a It, was) P(bright was, a)... P(April day, in). Pierre Nugues Language Processing with Perl and Prolog 15 / 39
16 Maximum Likelihood Estimate Bigrams: Trigrams: P MLE (w i w i 1 ) = C(w i 1,w i ) C(w i 1,w) = C(w i 1,w i ). C(w i 1 ) w P MLE (w i w i 2,w i 1 ) = C(w i 2,w i 1,w i ). C(w i 2,w i 1 ) Pierre Nugues Language Processing with Perl and Prolog 16 / 39
17 Conditional Probabilities A common mistake in computing the conditional probability P(w i w i 1 ) is to use C(w i 1,w i ) #bigrams. This is not correct. This formula corresponds to P(w i 1,w i ). The correct estimation is Proof: P MLE (w i w i 1 ) = C(w i 1,w i ) C(w i 1,w) = C(w i 1,w i ). C(w i 1 ) w P(w 1,w 2 ) = P(w 1 )P(w 2 w 1 ) = C(w 1) #words C(w 1,w 2 ) = C(w 1,w 2 ) C(w 1 ) #words Pierre Nugues Language Processing with Perl and Prolog 17 / 39
18 Training the Model Language Technology The model is trained on a part of the corpus: the training set It is tested on a different part: the test set The vocabulary can be derived from the corpus, for instance the 20,000 most frequent words, or from a lexicon It can be closed or open A closed vocabulary does not accept any new word An open vocabulary maps the new words, either in the training or test sets, to a specific symbol, <UNK> Pierre Nugues Language Processing with Perl and Prolog 18 / 39
19 Probability of a Sentence: Unigrams <s> A good deal of the literature of the past was, indeed, already being transformed in this way </s> w i C(w i ) #words P MLE (w i ) <s> 7072 a good deal of the literature of the past was indeed already being transformed in this way </s> Pierre Nugues Language Processing with Perl and Prolog 19 / 39
20 Probability of a Sentence: Bigrams <s> A good deal of the literature of the past was, indeed, already being transformed in this way </s> w i 1,w i C(w i 1,w i ) C(w i 1 ) P MLE (w i w i 1 ) <s> a a good good deal deal of of the the literature literature of of the the past past was was indeed indeed already already being being transformed transformed in in this this way way </s> Pierre Nugues Language Processing with Perl and Prolog 20 / 39
21 Sparse Data Language Technology Given a vocabulary of 20,000 types, the potential number of bigrams is 20,000 2 = 400,000,000 With trigrams 20,000 3 = 8,000,000,000,000 Methods: Laplace: add one to all counts Linear interpolation: P DelInterpolation (w n w n 2,w n 1 ) = λ 1 P MLE (w n w n 2 w n 1 )+ λ 2 P MLE (w n w n 1 ) + λ 3 P MLE (w n ), Good-Turing: The discount factor is variable and depends on the number of times a n-gram has occurred in the corpus. Back-off Pierre Nugues Language Processing with Perl and Prolog 21 / 39
22 Laplace s Rule Language Technology P Laplace (w i+1 w i ) = C(w i,w i+1 ) + 1 w (C(w i,w) + 1) = C(w i,w i+1 ) + 1 C(w i ) + Card(V ), w i,w i+1 C(w i,w i+1 ) + 1 C(w i ) + Card(V ) P Lap (w i+1 w i ) <s> a a good good deal deal of of the the literature literature of of the the past past was was indeed indeed already already being being transformed transformed in in this this way way </s> Pierre Nugues Language Processing with Perl and Prolog 22 / 39
23 Good Turing Language Technology Laplace s rule shifts an enormous mass of probability to very unlikely bigrams. Good Turing s estimation is more effective Let s denote N c the number of n-grams that occurred exactly c times in the corpus. N 0 is the number of unseen n-grams, N 1 the number of n-grams seen once,n 2 the number of n-grams seen twice The frequency of n-grams occurring c times is re-estimated as: c = (c + 1) E(N c+1) E(N c ), Unseen n-grams: c = N 1 N 0 and N-grams seen once: c = 2N 2 N 1. Pierre Nugues Language Processing with Perl and Prolog 23 / 39
24 Good-Turing for Nineteen eighty-four Nineteen eighty-four contains 37,365 unique bigrams and 5,820 bigram seen twice. Its vocabulary of 8,635 words generates = 74,563,225 bigrams whose 74,513,701 are unseen. Unseen bigrams: ,365 = , 365 = Unique bigrams: 74,513,701 Freq. of occ. N c c Freq. of occ. N c c 0 74,513, , , , , Pierre Nugues Language Processing with Perl and Prolog 24 / 39
25 Backoff Language Technology If there is no bigram, then use unigrams: { P(w i w i 1 ), if C(w i 1,w i ) 0, P Backoff (w i w i 1 ) = αp(w i ), otherwise. P MLE (w i w i 1 ) = C(w i 1,w i ), if C(w i 1,w i ) 0, P Backoff (w i w i 1 ) = C(w i 1 ) P MLE (w i ) = C(w i) #words, otherwise. Pierre Nugues Language Processing with Perl and Prolog 25 / 39
26 Backoff: Example Language Technology w i 1,w i C(w i 1,w i ) C(w i ) P Backoff (w i w i 1 ) <s> 7072 <s> a a good good deal 0 backoff deal of of the the literature literature of of the the past past was was indeed 0 backoff indeed already 0 backoff already being 0 backoff being transformed 0 backoff transformed in 0 backoff in this this way way </s> The figures we obtain are not probabilities. We can use the Good-Turing technique to discount the bigrams and then scale the unigram probabilities. This is the Katz backoff. Pierre Nugues Language Processing with Perl and Prolog 26 / 39
27 Quality of a Language Model Per word probability of a word sequence: H(L) = 1 n log 2 P(w 1,...,w n ). Entropy rate: H rate = 1 n p(w 1,...,w n )log 2 p(w 1,...,w n ), Cross entropy: We have: H(p,m) = 1 n H(p, m) = lim n 1 n w 1,...,w n L p(w 1,...,w n )log 2 m(w 1,...,w n ). w 1,...,w n L p(w 1,...,w n )log 2 m(w 1,...,w n ), w 1,...,w n L = lim n 1 n log 2 m(w 1,...,w n ). We compute the cross entropy on the complete word sequence of a test set, governed by p, using a bigram or trigram model, m, from a training set. Perplexity: PP(p,m) = 2 H(p,m). Pierre Nugues Language Processing with Perl and Prolog 27 / 39
28 Other Statistical Formulas Mutual information (The strength of an association): P(w i,w j ) I (w i,w j ) = log 2 P(w i )P(w j ) log NC(w i,w j ) 2 C(w i )C(w j ). T-score (The confidence of an association): t(w i,w j ) = mean(p(w i,w j )) mean(p(w i ))mean(p(w j )), σ 2 (P(w i,w j )) + σ 2 (P(w i )P(w j )) C(w i,w j ) 1 N C(w i)c(w j ). C(wi,w j ) Pierre Nugues Language Processing with Perl and Prolog 28 / 39
29 T-Scores with Word set Word Frequency Bigram set + word t-score up 134, a 1,228, to 1,375, off 52, out 12, Source: Bank of English Pierre Nugues Language Processing with Perl and Prolog 29 / 39
30 Mutual Information with Word surgery Word Frequency Bigram word + surgery Mutual info arthroscopic pioneeing reconstructive refractive rhinoplasty Source: Bank of English Pierre Nugues Language Processing with Perl and Prolog 30 / 39
31 Mutual Information and T-Scores in = split(/\n/, $text); for ($i = 0; $i < $#words; $i++) { $bigrams[$i] = $words[$i]. " ". $words[$i + 1]; } for ($i = 0; $i <= $#words; $i++) { $frequency{$words[$i]}++; } for ($i = 0; $i < $#words; $i++) { $frequency_bigrams{$bigrams[$i]}++; } Pierre Nugues Language Processing with Perl and Prolog 31 / 39
32 Mutual Information in Perl for ($i = 0; $i < $#words; $i++) { $mutual_info{$bigrams[$i]} = log(($#words + 1) * $frequency_bigrams{$bigrams[$i]}/ ($frequency{$words[$i]} * $frequency{$words[$i + 1]}))/ log(2); } foreach $bigram (keys = split(/ /, $bigram); print $mutual_info{$bigram}, " ", $bigram, "\t", $frequency_bigrams{$bigram}, "\t", $frequency{$bigram_array[0]}, "\t", $frequency{$bigram_array[1]}, "\n"; } Pierre Nugues Language Processing with Perl and Prolog 32 / 39
33 T-Scores in Perl Language Technology for ($i = 0; $i < $#words; $i++) { $t_scores{$bigrams[$i]} = ($frequency_bigrams{$bigrams[$i]} - $frequency{$words[$i]} * $frequency{$words[$i + 1]}/($#words + 1))/ sqrt($frequency_bigrams{$bigrams[$i]}); } foreach $bigram (keys %t_scores = split(/ /, $bigram); print $t_scores{$bigram}, " ", $bigram, "\t", $frequency_bigrams{$bigram}, "\t", $frequency{$bigram_array[0]}, "\t", $frequency{$bigram_array[1]}, "\n"; } Pierre Nugues Language Processing with Perl and Prolog 33 / 39
34 Information Retrieval: The Vector Space Model The vector space model is a technique to compute the similarity of two documents or to match a document and a query. The vector space model represents a document in word space: Documents w 1 w 2 w 3... w m \Words D 1 C(w 1,D 1 ) C(w 2,D 1 ) C(w 3,D 1 )... C(w m,d 1 ) D 2 C(w 1,D 2 ) C(w 2,D 2 ) C(w 3,D 2 )... C(w m,d 2 )... D n C(w 1,D 1 n) C(w 2,D n ) C(w 3,D n )... C(w m,d n ) We compute the similarity of two documents through their dot product. Pierre Nugues Language Processing with Perl and Prolog 34 / 39
35 The Vector Space Model: Example A collection of two documents D1 and D2 are: D1: Chrysler plans new investments in Latin America. D2: Chrysler plans major investments in Mexico. The vectors representing the two documents: D. america chrysler in investments latin major mexico new plans The vector space model represents documents as bags of words (BOW) that do not take the word order into account. The dot product is D1 D2 = = 4 Their cosine is D1 D2 D1. D2 = = 0.62 Pierre Nugues Language Processing with Perl and Prolog 35 / 39
36 Giving a Weight Language Technology Word clouds give visual weights to words Image: Courtesy of Jonas Wisbrant Pierre Nugues Language Processing with Perl and Prolog 36 / 39
37 TF IDF Language Technology The frequency alone might be misleading Document coordinates are in fact tf idf : Term frequency by inverted document frequency. Term frequency tf i,j : frequency of term j in document i Inverted document frequency: idf j = log( N n j ) Pierre Nugues Language Processing with Perl and Prolog 37 / 39
38 Document Similarity Language Technology Documents are vectors where coordinates could be the count of each word: d = (C(w1 ),C(w 2 ),C(w 3 ),...,C(w n )) The similarity between two documents or a query and a document is given by their cosine: Application: Lucene, Wikipedia n q i d i cos( q, i=1 d) = n n qi 2 i=1 i=1. di 2 Pierre Nugues Language Processing with Perl and Prolog 38 / 39
39 Inverted Index (Source Apple) UserExperience/Conceptual/SearchKitConcepts/index.html Pierre Nugues Language Processing with Perl and Prolog 39 / 39
Natural Language Processing SoSe Words and Language Model
Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence
More informationNatural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)
Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation
More informationNatural Language Processing. Statistical Inference: n-grams
Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationRecap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP
Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp
More informationLanguage Models. Philipp Koehn. 11 September 2018
Language Models Philipp Koehn 11 September 2018 Language models 1 Language models answer the question: How likely is a string of English words good English? Help with reordering p LM (the house is small)
More informationCMPT-825 Natural Language Processing
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop February 27, 2008 1 / 30 Cross-Entropy and Perplexity Smoothing n-gram Models Add-one Smoothing Additive Smoothing Good-Turing
More informationMachine Learning for natural language processing
Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a
More informationLanguage Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016
Language Modelling: Smoothing and Model Complexity COMP-599 Sept 14, 2016 Announcements A1 has been released Due on Wednesday, September 28th Start code for Question 4: Includes some of the package import
More informationProbabilistic Language Modeling
Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November
More informationLanguage Modeling. Michael Collins, Columbia University
Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting
More informationNLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc)
NLP: N-Grams Dan Garrette dhg@cs.utexas.edu December 27, 2013 1 Language Modeling Tasks Language idenfication / Authorship identification Machine Translation Speech recognition Optical character recognition
More informationLanguage Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky
Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high
More informationLanguage Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky
Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationN-gram Language Modeling Tutorial
N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures
More informationLanguage Technology. Unit 1: Sequence Models. CUNY Graduate Center Spring Lectures 5-6: Language Models and Smoothing. required hard optional
Language Technology CUNY Graduate Center Spring 2013 Unit 1: Sequence Models Lectures 5-6: Language Models and Smoothing required hard optional Professor Liang Huang liang.huang.sh@gmail.com Python Review:
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationCSA4050: Advanced Topics Natural Language Processing. Lecture Statistics III. Statistical Approaches to NLP
University of Malta BSc IT (Hons)Year IV CSA4050: Advanced Topics Natural Language Processing Lecture Statistics III Statistical Approaches to NLP Witten-Bell Discounting Unigrams Bigrams Dept Computer
More informationCS 224N HW:#3. (V N0 )δ N r p r + N 0. N r (r δ) + (V N 0)δ. N r r δ. + (V N 0)δ N = 1. 1 we must have the restriction: δ NN 0.
CS 224 HW:#3 ARIA HAGHIGHI SUID :# 05041774 1. Smoothing Probability Models (a). Let r be the number of words with r counts and p r be the probability for a word with r counts in the Absolute discounting
More informationperplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3)
Chapter 5 Language Modeling 5.1 Introduction A language model is simply a model of what strings (of words) are more or less likely to be generated by a speaker of English (or some other language). More
More informationLanguage Model. Introduction to N-grams
Language Model Introduction to N-grams Probabilistic Language Model Goal: assign a probability to a sentence Application: Machine Translation P(high winds tonight) > P(large winds tonight) Spelling Correction
More informationLanguage Models. CS6200: Information Retrieval. Slides by: Jesse Anderton
Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they
More informationFoundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model
Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January
More informationLanguage Models. Hongning Wang
Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity
More informationChapter 3: Basics of Language Modelling
Chapter 3: Basics of Language Modelling Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple
More informationWeek 13: Language Modeling II Smoothing in Language Modeling. Irina Sergienya
Week 13: Language Modeling II Smoothing in Language Modeling Irina Sergienya 07.07.2015 Couple of words first... There are much more smoothing techniques, [e.g. Katz back-off, Jelinek-Mercer,...] and techniques
More informationEmpirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationLanguage Processing with Perl and Prolog
Language Processing with Perl and Prolog es Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and Prolog 1 / 12 Training
More informationStatistical Natural Language Processing
Statistical Natural Language Processing N-gram Language Models Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2017 N-gram language models A language model answers
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationStatistical Methods for NLP
Statistical Methods for NLP Language Models, Graphical Models Sameer Maskey Week 13, April 13, 2010 Some slides provided by Stanley Chen and from Bishop Book Resources 1 Announcements Final Project Due,
More informationNgram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer
+ Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward
More informationThe Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview
The Language Modeling Problem We have some (finite) vocabulary, say V = {the, a, man, telescope, Beckham, two, } 6.864 (Fall 2007) Smoothed Estimation, and Language Modeling We have an (infinite) set of
More information{ Jurafsky & Martin Ch. 6:! 6.6 incl.
N-grams Now Simple (Unsmoothed) N-grams Smoothing { Add-one Smoothing { Backo { Deleted Interpolation Reading: { Jurafsky & Martin Ch. 6:! 6.6 incl. 1 Word-prediction Applications Augmentative Communication
More informationSpeech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =
More informationLecture 1b: Text, terms, and bags of words
Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection
More informationCS 6120/CS4120: Natural Language Processing
CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Outline Probabilistic language
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationANLP Lecture 6 N-gram models and smoothing
ANLP Lecture 6 N-gram models and smoothing Sharon Goldwater (some slides from Philipp Koehn) 27 September 2018 Sharon Goldwater ANLP Lecture 6 27 September 2018 Recap: N-gram models We can model sentence
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 12: Language Models for IR Outline Language models Language Models for IR Discussion What is a language model? We can view a finite state automaton as a deterministic
More informationLecture 2: N-gram. Kai-Wei Chang University of Virginia Couse webpage:
Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS 6501: Natural Language Processing 1 This lecture Language Models What are
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More informationUniversity of Illinois at Urbana-Champaign. Midterm Examination
University of Illinois at Urbana-Champaign Midterm Examination CS410 Introduction to Text Information Systems Professor ChengXiang Zhai TA: Azadeh Shakery Time: 2:00 3:15pm, Mar. 14, 2007 Place: Room 1105,
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationDeep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of
More informationChapter 3: Basics of Language Modeling
Chapter 3: Basics of Language Modeling Section 3.1. Language Modeling in Automatic Speech Recognition (ASR) All graphs in this section are from the book by Schukat-Talamazzini unless indicated otherwise
More informationN-gram Language Model. Language Models. Outline. Language Model Evaluation. Given a text w = w 1...,w t,...,w w we can compute its probability by:
N-gram Language Model 2 Given a text w = w 1...,w t,...,w w we can compute its probability by: Language Models Marcello Federico FBK-irst Trento, Italy 2016 w Y Pr(w) =Pr(w 1 ) Pr(w t h t ) (1) t=2 where
More informationSYNTHER A NEW M-GRAM POS TAGGER
SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de
More informationAn Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling
An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling Masaharu Kato, Tetsuo Kosaka, Akinori Ito and Shozo Makino Abstract Topic-based stochastic models such as the probabilistic
More informationTnT Part of Speech Tagger
TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation
More informationGraphical Models. Mark Gales. Lent Machine Learning for Language Processing: Lecture 3. MPhil in Advanced Computer Science
Graphical Models Mark Gales Lent 2011 Machine Learning for Language Processing: Lecture 3 MPhil in Advanced Computer Science MPhil in Advanced Computer Science Graphical Models Graphical models have their
More informationNatural Language Processing
Natural Language Processing Language models Based on slides from Michael Collins, Chris Manning and Richard Soccer Plan Problem definition Trigram models Evaluation Estimation Interpolation Discounting
More informationLanguage Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky
Language Modeling Introduc*on to N- grams Many Slides are adapted from slides by Dan Jurafsky Probabilis1c Language Models Today s goal: assign a probability to a sentence Machine Transla*on: Why? P(high
More informationLanguage Modeling. Introduction to N-grams. Klinton Bicknell. (borrowing from: Dan Jurafsky and Jim Martin)
Language Modeling Introduction to N-grams Klinton Bicknell (borrowing from: Dan Jurafsky and Jim Martin) Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation:
More informationAlgorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address
More informationEmpirical Methods in Natural Language Processing Lecture 5 N-gram Language Models
Empirical Methods in Natural Language Processing Lecture 5 N-gram Language Models (most slides from Sharon Goldwater; some adapted from Alex Lascarides) 29 January 2017 Nathan Schneider ENLP Lecture 5
More informationarxiv:cmp-lg/ v1 9 Jun 1997
arxiv:cmp-lg/9706007 v1 9 Jun 1997 Aggregate and mixed-order Markov models for statistical language processing Abstract We consider the use of language models whose size and accuracy are intermediate between
More informationNatural Language Processing (CSE 490U): Language Models
Natural Language Processing (CSE 490U): Language Models Noah Smith c 2017 University of Washington nasmith@cs.washington.edu January 6 9, 2017 1 / 67 Very Quick Review of Probability Event space (e.g.,
More informationFROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.
More informationRetrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1
Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)
More informationChapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze
Chapter 10: Information Retrieval See corresponding chapter in Manning&Schütze Evaluation Metrics in IR 2 Goal In IR there is a much larger variety of possible metrics For different tasks, different metrics
More informationText Mining. March 3, March 3, / 49
Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49
More informationLanguage Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN
Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 20, 2017 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationBasic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology
Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are
More informationN-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition
2010 11 5 N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition 1 48-106413 Abstract Large-Vocabulary Continuous Speech Recognition(LVCSR) system has rapidly been growing today.
More informationTopic Modeling: Beyond Bag-of-Words
University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a
More informationPart-of-Speech Tagging
Part-of-Speech Tagging Informatics 2A: Lecture 17 Adam Lopez School of Informatics University of Edinburgh 27 October 2016 1 / 46 Last class We discussed the POS tag lexicon When do words belong to the
More informationGloVe: Global Vectors for Word Representation 1
GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background
More informationWhat is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.
What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. Text mining What can be used for text mining?? Classification/categorization
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationCSC321 Lecture 15: Recurrent Neural Networks
CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationGrammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University
Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP
More informationPhrase Finding; Stream-and-Sort vs Request-and-Answer. William W. Cohen
Phrase Finding; Stream-and-Sort vs Request-and-Answer William W. Cohen Announcement Two-day extension for all on HW1B: now due Thursday 1/29 Correction Using Large-vocabulary Naïve Bayes For each example
More informationKneser-Ney smoothing explained
foldl home blog contact feed Kneser-Ney smoothing explained 18 January 2014 Language models are an essential element of natural language processing, central to tasks ranging from spellchecking to machine
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationBehavioral Data Mining. Lecture 2
Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationMore Smoothing, Tuning, and Evaluation
More Smoothing, Tuning, and Evaluation Nathan Schneider (slides adapted from Henry Thompson, Alex Lascarides, Chris Dyer, Noah Smith, et al.) ENLP 21 September 2016 1 Review: 2 Naïve Bayes Classifier w
More informationAdvanced topics in language modeling: MaxEnt, features and marginal distritubion constraints
Advanced topics in language modeling: MaxEnt, features and marginal distritubion constraints Brian Roark, Google NYU CSCI-GA.2585 Speech Recognition, Weds., April 15, 2015 (Don t forget to pay your taxes!)
More informationVector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying
More informationExploring Asymmetric Clustering for Statistical Language Modeling
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL, Philadelphia, July 2002, pp. 83-90. Exploring Asymmetric Clustering for Statistical Language Modeling Jianfeng
More informationProbabilistic Spelling Correction CE-324: Modern Information Retrieval Sharif University of Technology
Probabilistic Spelling Correction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan lectures
More informationNATURAL LANGUAGE PROCESSING. Dr. G. Bharadwaja Kumar
NATURAL LANGUAGE PROCESSING Dr. G. Bharadwaja Kumar Sentence Boundary Markers Many natural language processing (NLP) systems generally take a sentence as an input unit part of speech (POS) tagging, chunking,
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationHMM and Part of Speech Tagging. Adam Meyers New York University
HMM and Part of Speech Tagging Adam Meyers New York University Outline Parts of Speech Tagsets Rule-based POS Tagging HMM POS Tagging Transformation-based POS Tagging Part of Speech Tags Standards There
More informationText Analytics (Text Mining)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS
More informationStatistical methods in NLP, lecture 7 Tagging and parsing
Statistical methods in NLP, lecture 7 Tagging and parsing Richard Johansson February 25, 2014 overview of today's lecture HMM tagging recap assignment 3 PCFG recap dependency parsing VG assignment 1 overview
More informationCS4442/9542b Artificial Intelligence II prof. Olga Veksler
CS4442/9542b Artificial Intelligence II prof. Olga Veksler Lecture 14 Natural Language Processing Language Models Many slides from: Joshua Goodman, L. Kosseim, D. Klein, D. Jurafsky Outline Why we need
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationIntroduction to Probablistic Natural Language Processing
Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation
More informationNatural Language Processing
Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization
More information