9/10/18. CS 6120/CS4120: Natural Language Processing. Outline. Probabilistic Language Models. Probabilistic Language Modeling

Size: px
Start display at page:

Download "9/10/18. CS 6120/CS4120: Natural Language Processing. Outline. Probabilistic Language Models. Probabilistic Language Modeling"

Transcription

1 Outlne CS 6120/CS4120: Natural Language Processng Instructor: Prof. Lu Wang College of Computer and Informaton Scence Northeastern Unversty Webpage: Probablstc language model and n-grams Estmatng n-gram probabltes Language model evaluaton and perplexty Generalzaton and zeros Smoothng: add-one Interpolaton, backoff, and web-scale LMs Smoothng: Kneser-Ney Smoothng [Modfed from Dan Jurafsky s sldes] Probablstc Language Models Assgn a probablty to a sentence Machne Translaton: P(hgh wnds tonght > P(large wnds tonght Spell Correcton The offce s about ffteen mnuets from my house P(about ffteen mnutes from > P(about ffteen mnuets from Speech Recognton P(I saw a van >> P(eyes awe of an Text Generaton n general: Summarzaton, queston-answerng Probablstc Language Modelng Goal: compute the probablty of a sentence or sequence of words: P(W = P(w1,w2,w3,w4,w5 wn Related task: probablty of an upcomng word: P(w5 w1,w2,w3,w4 A model that computes ether of these: P(W or P(wn w1,w2 wn-1 s called a language model. Better: the grammar But language model (or LM s standard How to compute P(W How to compute ths jont probablty: P(ts, water, s, so, transparent, that Intuton: let s rely on the Chan Rule of Probablty Quck Revew: Probablty Recall the defnton of condtonal probabltes p(b A = P(A,B/P(A Rewrtng: P(A,B = P(AP(B A More varables: P(A,B,C,D = P(AP(B AP(C A,BP(D A,B,C The Chan Rule n General P(x 1,x 2,x 3,,x n = P(x 1 P(x 2 x 1 P(x 3 x 1,x 2 P(x n x 1,,x n-1 1

2 The Chan Rule appled to compute jont probablty of words n sentence P(w 1 w n = P(w w 1 w 1 The Chan Rule appled to compute jont probablty of words n sentence P(w 1 w n = P(w w 1 w 1 P( ts water s so transparent = P(ts P(water ts P(s ts water P(so ts water s P(transparent ts water s so How to estmate these probabltes Could we just count and dvde? P(the ts water s so transparent that = Count(ts water s so transparent that the Count(ts water s so transparent that How to estmate these probabltes Could we just count and dvde? P(the ts water s so transparent that = Count(ts water s so transparent that the Count(ts water s so transparent that No! Too many possble sentences! We ll never see enough data for estmatng these Markov Assumpton Smplfyng assumpton: P(the ts water s so transparent that P(the that Or maybe P(the ts water s so transparent that P(the transparent that Markov Assumpton P(w 1 w n P(w w k w 1 In other words, we approxmate each component n the product P(w w 1 w 1 P(w w k w 1 2

3 Smplest case: Ungram model ffth, an, of, futures, the, an, ncorporated, a, a, the, nflaton, most, dollars, quarter, n, s, mass thrft, dd, eghty, sad, hard, 'm, july, bullsh that, or, lmted, the P(w 1 w n P(w Some automatcally generated sentences from a ungram model Bgram model Condton on the prevous word: P(w w 1 w 1 P(w texaco, rose, one, n, ths, ssue, s, pursung, growth, n, a, boler, house, sad, mr., gurra, mexco, 's, moton, control, proposal, wthout, permsson, from, fve, hundred, ffty, fve, yen outsde, new, car, parkng, lot, of, the, agreement, reached ths, would, be, a, record, november N-gram models We can extend to trgrams, 4-grams, 5-grams N-gram models We can extend to trgrams, 4-grams, 5-grams In general ths s an nsuffcent model of language because language has long-dstance dependences: The computer(s whch I had just put nto the machne room on the ffth floor s (are crashng. But we can often get away wth N-gram models Today s Outlne Probablstc language model and n-grams Estmatng n-gram probabltes Language model evaluaton and perplexty Generalzaton and zeros Smoothng: add-one Interpolaton, backoff, and web-scale LMs Smoothng: Kneser-Ney Smoothng Estmatng bgram probabltes The Maxmum Lkelhood Estmate for bgram probablty P(w = count(w 1,w count(w 1 P(w =,w 3

4 An example An example P(w =,w <s> I am Sam </s> <s> Sam I am </s> <s> I do not lke green eggs and ham </s> P(w =,w <s> I am Sam </s> <s> Sam I am </s> <s> I do not lke green eggs and ham </s> More examples: Berkeley Restaurant Project sentences can you tell me about any good cantonese restaurants close by md prced tha food s what m lookng for tell me about chez pansse can you gve me a lstng of the knds of food that are avalable m lookng for a good place to eat breakfast when s caffe veneza open durng the day Raw bgram counts Out of 9222 sentences Raw bgram probabltes Normalze by ungrams: Result: Bgram estmates of sentence probabltes P(<s> I want englsh food </s> = P(I <s> P(want I P(englsh want P(food englsh P(</s> food =

5 Knowledge P(englsh want =.0011 P(chnese want =.0065 P(to want =.66 P(eat to =.28 P(food to = 0 P(want spend = 0 P ( <s> =.25 Practcal Issues We do everythng n log space Avod underflow (also addng s faster than multplyng log(p 1 p 2 p 3 p 4 = log p 1 + log p 2 + log p 3 + log p 4 Language Modelng Toolkts Google N-Gram Release, August 2006 SRILM Google N-Gram Release serve as the ncomng 92 serve as the ncubator 99 serve as the ndependent 794 serve as the ndex 223 serve as the ndcaton 72 serve as the ndcator 120 serve as the ndcators 45 serve as the ndspensable 111 serve as the ndspensble 40 serve as the ndvdual 234 Today s Outlne Probablstc language model and n-grams Estmatng n-gram probabltes Language model evaluaton and perplexty Generalzaton and zeros Smoothng: add-one Interpolaton, backoff, and web-scale LMs Smoothng: Kneser-Ney Smoothng 5

6 Evaluaton: How good s our model? Evaluaton: How good s our model? Does our language model prefer good sentences to bad ones? Assgn hgher probablty to real or frequently observed sentences Than ungrammatcal or rarely observed sentences? We tran parameters of our model on a tranng set. We test the model s performance on data we haven t seen. A test set s an unseen dataset that s dfferent from our tranng set, totally unused. An evaluaton metrc tells us how well our model does on the test set. Tranng on the test set We can t allow test sentences nto the tranng set We wll assgn t an artfcally hgh probablty when we set t n the test set Tranng on the test set Bad scence! And volates the honor code Extrnsc evaluaton of N-gram models Best evaluaton for comparng models A and B Put each model n a task spellng corrector, speech recognzer, MT system Run the task, get an accuracy for A and for B How many msspelled words corrected properly How many words translated correctly Compare accuracy for A and B Dffculty of extrnsc evaluaton of N-gram models Extrnsc evaluaton Tme-consumng; can take days or weeks So Sometmes use ntrnsc evaluaton: perplexty Dffculty of extrnsc evaluaton of N-gram models Extrnsc evaluaton Tme-consumng; can take days or weeks So Sometmes use ntrnsc evaluaton: perplexty Bad approxmaton unless the test data looks just lke the tranng data So generally only useful n plot experments But s helpful to thnk about. 6

7 Intuton of Perplexty Intuton of Perplexty The Shannon Game: How well can we predct the next word? I always order pzza wth cheese and The 33 rd Presdent of the US was I saw a Ungrams are terrble at ths game. (Why? A better model of a text s one whch assgns a hgher probablty to the word that actually occurs The Shannon Game: How well can we predct the next word? I always order pzza wth cheese and The 33 rd Presdent of the US was I saw a Ungrams are terrble at ths game. (Why? mushrooms 0.1 pepperon 0.1 anchoves 0.01 A better model of a text s one whch assgns a hgher probablty to the word that actually occurs. fred rce and 1e-100 Perplexty Perplexty The best language model s one that best predcts an unseen test set Gves the hghest P(sentence Perplexty s the nverse probablty of the test set, normalzed by the number of words: PP(W = P(w 1...w N 1 N = 1 N P(w 1...w N The best language model s one that best predcts an unseen test set Gves the hghest P(sentence Perplexty s the nverse probablty of the test set, normalzed by the number of words: PP(W = P(w 1...w N 1 N = 1 N P(w 1...w N Chan rule: For bgrams: Perplexty The best language model s one that best predcts an unseen test set Gves the hghest P(sentence Perplexty s the nverse probablty of the test set, normalzed by the number of words: PP(W = P(w 1...w N 1 N = 1 N P(w 1...w N Perplexty as branchng factor Let s suppose a sentence consstng of random dgts What s the perplexty of ths sentence accordng to a model that assgn P=1/10 to each dgt? Chan rule: For bgrams: Mnmzng perplexty s the same as maxmzng probablty 7

8 Perplexty as branchng factor Let s suppose a sentence consstng of random dgts What s the perplexty of ths sentence accordng to a model that assgn P=1/10 to each dgt? Lower perplexty = better model Tranng 38 mllon words, test 1.5 mllon words, WSJ N-gram Order Ungram Bgram Trgram Perplexty Today s Outlne Probablstc language model and n-grams Estmatng n-gram probabltes Language model evaluaton and perplexty Generalzaton and zeros Smoothng: add-one Interpolaton, backoff, and web-scale LMs Smoothng: Kneser-Ney Smoothng The perls of overfttng N-grams only work well for word predcton f the test corpus looks lke the tranng corpus In real lfe, t often doesn t We need to tran robust models that generalze! The perls of overfttng N-grams only work well for word predcton f the test corpus looks lke the tranng corpus In real lfe, t often doesn t We need to tran robust models that generalze! One knd of generalzaton: Zeros! Thngs that don t ever occur n the tranng set But occur n the test set Zeros In tranng set, we see dened the allegatons dened the reports dened the clams dened the request P( offer dened the = 0 But n test set, dened the offer dened the loan 8

9 Zero probablty bgrams Bgrams wth zero probablty mean that we wll assgn 0 probablty to the test set! And hence we cannot compute perplexty (can t dvde by 0! Today s Outlne Probablstc language model and n-grams Estmatng n-gram probabltes Language model evaluaton and perplexty Generalzaton and zeros Smoothng: add-one Interpolaton, backoff, and web-scale LMs Smoothng: Kneser-Ney Smoothng The ntuton of smoothng (from Dan Klen Add-one estmaton When we have sparse statstcs: P(w dened the 3 allegatons 2 reports 1 clams 1 request 7 total Steal probablty mass to generalze better P(w dened the 2.5 allegatons 1.5 reports 0.5 clams 0.5 request 2 other 7 total allegatons allegatons reports reports clams clams request request attack man attack man outcome outcome Also called Laplace smoothng Pretend we saw each word one more tme than we dd Just add one to all the counts! (Instead of takng away counts P MLE estmate: MLE (w =, w Add-1 estmate: P Add 1 (w =, w +1 +V Add-one estmaton Also called Laplace smoothng Pretend we saw each word one more tme than we dd Just add one to all the counts! MLE estmate: P MLE (w =, w Berkeley Restaurant Corpus: Laplace smoothed bgram counts Add-1 estmate: P Add 1 (w =, w +1 +V Why add V? 9

10 Laplace-smoothed bgrams Add-1 estmaton s a blunt nstrument So add-1 sn t used for N-grams: We ll see better methods (nowadays, neural LM becomes popular, wll dscuss later But add-1 s used to smooth other NLP models For text classfcaton (comng soon! In domans where the number of zeros sn t so huge. Today s Outlne Probablstc language model and n-grams Estmatng n-gram probabltes Language model evaluaton and perplexty Generalzaton and zeros Smoothng: add-one Interpolaton, backoff, and web-scale LMs Smoothng: Kneser-Ney Smoothng Backoff and Interpolaton Sometmes t helps to use less context Condton on less context for contexts you haven t learned much about Backoff: use trgram f you have good evdence otherwse bgram otherwse ungram Interpolaton: mx ungram, bgram, trgram In general, nterpolaton works better Lnear Interpolaton Smple nterpolaton ˆP(wn wn 2wn 1 = l1p(wn wn 2wn 1 +l2p(wn wn 1 +l3p(wn X X l = 1 How to set the lambdas? Use a held-out corpus Tranng Data Choose λs to maxmze the probablty of held-out data: Fx the N-gram probabltes (on the tranng data Then search for λs that gve largest probablty to held-out set: Held-Out Data Test Data log P(w 1...w n M(λ 1...λ k = log P M (λ1...λ k (w 10

11 A Common Method Grd Search Take a lst of possble values, e.g. [0.1, 0.2,,0.9] Try all combnatons Lnear Interpolaton Smple nterpolaton ˆP(wn wn 2wn 1 = l1p(wn wn 2wn 1 +l2p(wn wn 1 +l3p(wn Lambdas condtonal on context: X X l = 1 Lnear Interpolaton Smple nterpolaton ˆP(wn wn 2wn 1 = l1p(wn wn 2wn 1 +l2p(wn wn 1 +l3p(wn Lambdas condtonal on context: X X l = 1 How to estmate? Unknown words: Open versus closed vocabulary tasks If we know all the words n advanced Vocabulary V s fxed Closed vocabulary task Often we don t know ths Out Of Vocabulary= OOV words Open vocabulary task Unknown words: Open versus closed vocabulary tasks If we know all the words n advanced Vocabulary V s fxed Closed vocabulary task Often we don t know ths Out Of Vocabulary = OOV words Open vocabulary task Instead: create an unknown word token <UNK> Tranng of <UNK> probabltes Create a fxed lexcon L of sze V (e.g. selectng hgh frequency words At text normalzaton phase, any tranng word not n L changed to <UNK> Now we tran ts probabltes lke a normal word At test tme If text nput: Use UNK probabltes for any word not n tranng Smoothng for Web-scale N-grams Stupd backoff (Brants et al No dscountng, just use relatve frequences 1 S(w w k+1 " $ = # $ % $ S(w = count(w N count(w k+1 1 count(w k+1 f count(w k+1 > S(w w k+2 otherwse Untl ungram probablty 11

12 Today s Outlne Probablstc language model and n-grams Estmatng n-gram probabltes Language model evaluaton and perplexty Generalzaton and zeros Smoothng: add-one Interpolaton, backoff, and web-scale LMs Smoothng: Kneser-Ney Smoothng Absolute dscountng: just subtract a lttle from each count Suppose we wanted to subtract a lttle from a count of 4 to save probablty mass for the zeros How much to subtract? Church and Gale (1991 s clever dea Dvde up 22 mllon words of AP Newswre Tranng and held-out set for each bgram n the tranng set see the actual count n the held-out set! It sure looks lke c* = (c -.75 Bgram count Bgram count n n tranng heldout set Absolute Dscountng Interpolaton Save ourselves some tme and just subtract 0.75 (or some d! dscounted bgram P AbsoluteDscountng (w =, w d Interpolaton weght + λ(w 1 P(w ungram But should we really just use the regular ungram P(w? Kneser-Ney Smoothng I Better estmate for probabltes of lower-order ungrams! glasses Shannon game: I can t see wthout my readng? Francsco s more common than glasses Francsco but Francsco always follows San The ungram s useful exactly when we haven t seen ths bgram! Instead of P(w: How lkely s w Pcontnuaton(w: How lkely s w to appear as a novel contnuaton? For each word, count the number of unque bgrams t completes Every unque bgram was a novel contnuaton the frst tme t was seen P CONTINUATION (w {w 1 :, w > 0} Unque bgrams w s n Kneser-Ney Smoothng II How many tmes does w appear as a novel contnuaton (unque bgrams: P CONTINUATION (w {w 1 :, w > 0} Normalzed by the total number of word bgram types {(w j 1, w j : c(w j 1, w j > 0} All unque bgrams n the corpus {w P CONTINUATION (w = 1 :, w > 0} {(w j 1, w j : c(w j 1, w j > 0} Kneser-Ney Smoothng III Alternatve metaphor: The number of # of unque words seen to precede w {w 1 :, w > 0} normalzed by the # of words precedng all words: P CONTINUATION (w = {w 1 :, w > 0} {w' 1 : c(w' 1, w' > 0} w' A frequent word (Francsco occurrng n only one context (San wll have a low contnuaton probablty 12

13 Kneser-Ney Smoothng IV P KN (w = max(c(w, w 1 d, 0 + λ(w 1 P CONTINUATION (w λ s a normalzng constant; the probablty mass we ve dscounted d λ(w 1 = {w :, w > 0} Kneser-Ney Smoothng: Recursve formulaton 1 P KN (w w n+1 = max(c KN (w n+1 1 c KN (w n+1 d, λ(w n+1 1 P KN (w w n+2!# count( for the hghest order c KN ( = " $# contnuatoncount( for lower order the normalzed dscount The number of word types that can follow w-1 = # of word types we dscounted = # of tmes we appled normalzed dscount Contnuaton count = Number of unque sngle word contexts for Language Modelng Homework Probablstc language model and n-grams Estmatng n-gram probabltes Language model evaluaton and perplexty Generalzaton and zeros Smoothng: add-one Interpolaton, backoff, and web-scale LMs Smoothng: Kneser-Ney Smoothng Readng J&M ch1 and ch Start thnkng about course project and fnd a team Project proposal due Oct 1 st The format of the proposal wll be posted on Pazza 13

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Outline Probabilistic language

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduc*on to N- grams Many Slides are adapted from slides by Dan Jurafsky Probabilis1c Language Models Today s goal: assign a probability to a sentence Machine Transla*on: Why? P(high

More information

Language Modeling. Introduction to N-grams. Klinton Bicknell. (borrowing from: Dan Jurafsky and Jim Martin)

Language Modeling. Introduction to N-grams. Klinton Bicknell. (borrowing from: Dan Jurafsky and Jim Martin) Language Modeling Introduction to N-grams Klinton Bicknell (borrowing from: Dan Jurafsky and Jim Martin) Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation:

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Today s Outline Probabilistic

More information

Language Modeling. Mausam. (Based on slides of Michael Collins, Dan Jurafsky, Dan Klein, Chris Manning, Luke Zettlemoyer)

Language Modeling. Mausam. (Based on slides of Michael Collins, Dan Jurafsky, Dan Klein, Chris Manning, Luke Zettlemoyer) Language Modelng Mausam Based on sldes of Mchael Collns, Dan Jurafsky, Dan Klen, Chrs Mannng, Luke Zettlemoyer Outlne Motvaton Task Defnton Probablty Estmaton Evaluaton Smoothng 2 Smple Interpolaton and

More information

Language Modeling. Introduction to N- grams

Language Modeling. Introduction to N- grams Language Modeling Introduction to N- grams Probabilistic Language Models Today s goal: assign a probability to a sentence Machine Translation: P(high winds tonite) > P(large winds tonite) Why? Spell Correction

More information

Language Modeling. Introduction to N- grams

Language Modeling. Introduction to N- grams Language Modeling Introduction to N- grams Probabilistic Language Models Today s goal: assign a probability to a sentence Machine Translation: P(high winds tonite) > P(large winds tonite) Why? Spell Correction

More information

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduc*on to N- grams Many Slides are adapted from slides by Dan Jurafsky Probabilis1c Language Models Today s goal: assign a probability to a sentence Machine Transla*on: Why? P(high

More information

Introduction to N-grams

Introduction to N-grams Lecture Notes, Artificial Intelligence Course, University of Birzeit, Palestine Spring Semester, 2014 (Advanced/) Artificial Intelligence Probabilistic Language Modeling Introduction to N-grams Dr. Mustafa

More information

language modeling: n-gram models

language modeling: n-gram models language modeling: n-gram models CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer + Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward

More information

Language Model. Introduction to N-grams

Language Model. Introduction to N-grams Language Model Introduction to N-grams Probabilistic Language Model Goal: assign a probability to a sentence Application: Machine Translation P(high winds tonight) > P(large winds tonight) Spelling Correction

More information

CSCI 5832 Natural Language Processing. Today 1/31. Probability Basics. Lecture 6. Probability. Language Modeling (N-grams)

CSCI 5832 Natural Language Processing. Today 1/31. Probability Basics. Lecture 6. Probability. Language Modeling (N-grams) CSCI 5832 Natural Language Processing Jim Martin Lecture 6 1 Today 1/31 Probability Basic probability Conditional probability Bayes Rule Language Modeling (N-grams) N-gram Intro The Chain Rule Smoothing:

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Question Classification Using Language Modeling

Question Classification Using Language Modeling Queston Classfcaton Usng Language Modelng We L Center for Intellgent Informaton Retreval Department of Computer Scence Unversty of Massachusetts, Amherst, MA 01003 ABSTRACT Queston classfcaton assgns a

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II Statstcal NLP Sprng 2010 Feature-Rch Sequence Models Problem: HMMs make t hard to work wth arbtrary features of a sentence Example: name entty recognton (NER) PER PER O O O O O O ORG O O O O O LOC LOC

More information

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan

More information

} Often, when learning, we deal with uncertainty:

} Often, when learning, we deal with uncertainty: Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally

More information

ML4NLP Introduction to Classification

ML4NLP Introduction to Classification ML4NLP Introducton to Classfcaton CS 590NLP Dan Goldwasser Purdue Unversty dgoldwas@purdue.edu Statstcal Language Modelng Intuton: by lookng at large quanttes of text we can fnd statstcal regulartes Dstngush

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing. Statistical Inference: n-grams Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

Week 13: Language Modeling II Smoothing in Language Modeling. Irina Sergienya

Week 13: Language Modeling II Smoothing in Language Modeling. Irina Sergienya Week 13: Language Modeling II Smoothing in Language Modeling Irina Sergienya 07.07.2015 Couple of words first... There are much more smoothing techniques, [e.g. Katz back-off, Jelinek-Mercer,...] and techniques

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Queueing Networks II Network Performance

Queueing Networks II Network Performance Queueng Networks II Network Performance Davd Tpper Assocate Professor Graduate Telecommuncatons and Networkng Program Unversty of Pttsburgh Sldes 6 Networks of Queues Many communcaton systems must be modeled

More information

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger JAB Chan Long-tal clams development ASTIN - September 2005 B.Verder A. Klnger Outlne Chan Ladder : comments A frst soluton: Munch Chan Ladder JAB Chan Chan Ladder: Comments Black lne: average pad to ncurred

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Maxent Models & Deep Learning

Maxent Models & Deep Learning Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Interpolated Markov Models for Gene Finding

Interpolated Markov Models for Gene Finding Interpolated Markov Models for Gene Fndng BMI/CS 776 www.bostat.wsc.edu/bm776/ Sprng 208 Anthony Gtter gtter@bostat.wsc.edu hese sldes, ecludng thrd-party materal, are lcensed under CC BY-NC 4.0 by Mark

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Information Retrieval Language models for IR

Information Retrieval Language models for IR Informaton Retreval Language models for IR From Mannng and Raghavan s course [Borros sldes from Vktor Lavrenko and Chengxang Zha] 1 Recap Tradtonal models Boolean model Vector space model robablstc models

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation Readngs: K&F 0.3, 0.4, 0.6, 0.7 Learnng undrected Models Lecture 8 June, 0 CSE 55, Statstcal Methods, Sprng 0 Instructor: Su-In Lee Unversty of Washngton, Seattle Mean Feld Approxmaton Is the energy functonal

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Unsupervised Learning

Unsupervised Learning Unsupervsed Learnng Kevn Swngler What s Unsupervsed Learnng? Most smply, t can be thought of as learnng to recognse and recall thngs Recognton I ve seen that before Recall I ve seen that before and I can

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press.

This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press. Ths excerpt from Foundatons of Statstcal Natural Language Processng. Chrstopher D. Mannng and Hnrch Schütze. 1999 The MIT Press. s provded n screen-vewable form for personal use only by members of MIT

More information

Hidden Markov Models

Hidden Markov Models Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Language Models. Philipp Koehn. 11 September 2018

Language Models. Philipp Koehn. 11 September 2018 Language Models Philipp Koehn 11 September 2018 Language models 1 Language models answer the question: How likely is a string of English words good English? Help with reordering p LM (the house is small)

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc)

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc) NLP: N-Grams Dan Garrette dhg@cs.utexas.edu December 27, 2013 1 Language Modeling Tasks Language idenfication / Authorship identification Machine Translation Speech recognition Optical character recognition

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

We have a sitting situation

We have a sitting situation We have a sitting situation 447 enrollment: 67 out of 64 547 enrollment: 10 out of 10 2 special approved cases for audits ---------------------------------------- 67 + 10 + 2 = 79 students in the class!

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Large-Margin HMM Estimation for Speech Recognition

Large-Margin HMM Estimation for Speech Recognition Large-Margn HMM Estmaton for Speech Recognton Prof. Hu Jang Department of Computer Scence and Engneerng York Unversty, Toronto, Ont. M3J 1P3, CANADA Emal: hj@cs.yorku.ca Ths s a jont work wth Chao-Jun

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

{ Jurafsky & Martin Ch. 6:! 6.6 incl.

{ Jurafsky & Martin Ch. 6:! 6.6 incl. N-grams Now Simple (Unsmoothed) N-grams Smoothing { Add-one Smoothing { Backo { Deleted Interpolation Reading: { Jurafsky & Martin Ch. 6:! 6.6 incl. 1 Word-prediction Applications Augmentative Communication

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING Samplng heory MODULE VII LECURE - 3 VARYIG PROBABILIY SAMPLIG DR. SHALABH DEPARME OF MAHEMAICS AD SAISICS IDIA ISIUE OF ECHOLOGY KAPUR he smple random samplng scheme provdes a random sample where every

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1 robablty reew dopted from notes of ndrew W. Moore and Erc Xng from CMU Copyrght ndrew W. Moore Slde So far our classfers are determnstc! For a gen X, the classfers we learned so far ge a sngle predcted

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi) Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Language models Based on slides from Michael Collins, Chris Manning and Richard Soccer Plan Problem definition Trigram models Evaluation Estimation Interpolation Discounting

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected. ANSWERS CHAPTER 9 THINK IT OVER thnk t over TIO 9.: χ 2 k = ( f e ) = 0 e Breakng the equaton down: the test statstc for the ch-squared dstrbuton s equal to the sum over all categores of the expected frequency

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information