Language Modeling. Mausam. (Based on slides of Michael Collins, Dan Jurafsky, Dan Klein, Chris Manning, Luke Zettlemoyer)

Size: px
Start display at page:

Download "Language Modeling. Mausam. (Based on slides of Michael Collins, Dan Jurafsky, Dan Klein, Chris Manning, Luke Zettlemoyer)"

Transcription

1 Language Modelng Mausam Based on sldes of Mchael Collns, Dan Jurafsky, Dan Klen, Chrs Mannng, Luke Zettlemoyer

2 Outlne Motvaton Task Defnton Probablty Estmaton Evaluaton Smoothng 2 Smple Interpolaton and Back-off Advanced Algorthms

3 The Language Modelng Problem Setup: Assume a fnte vocabulary of ords We can construct an nfnte set of strngs Data: gven a tranng set of example sentences Problem: estmate a probablty dstrbuton

4 The Nosy-Channel Model We ant to predct a sentence gven acoustcs: The nosy channel approach: Acoustc model: Dstrbutons over acoustc aves gven a sentence Language model: Dstrbutons over sequences of ords sentences

5 Acoustcally Scored Hypotheses the staton sgns are n deep n englsh the statons sgns are n deep n englsh the staton sgns are n deep nto englsh the staton 's sgns are n deep n englsh the staton sgns are n deep n the englsh -474 the staton sgns are ndeed n englsh the staton 's sgns are ndeed n englsh the staton sgns are ndans n englsh the staton sgns are ndan n englsh the statons sgns are ndans n englsh the statons sgns are ndans and englsh -485

6 ASR System Components Language Model Acoustc Model source P channel Pa a best decoder observed a argmax P a = argmax Pa P

7 Translaton: Codebreakng? Also knong nothng offcal about, but havng guessed and nferred consderable about, the poerful ne mechanzed methods n cryptography methods hch I beleve succeed even hen one does not kno hat language has been coded one naturally onders f the problem of translaton could concevably be treated as a problem n cryptography. When I look at an artcle n Russan, I say: Ths s really rtten n Englsh, but t has been coded n some strange symbols. I ll no proceed to decode. Warren Weaver 955:8, quotng a letter he rote n 947

8 MT System Components Language Model Translaton Model source Pe e channel Pf e f best e decoder observed f argmax Pe f = argmax Pf epe e e

9 Probablstc Language Models: Other Applcatons Why assgn a probablty to a sentence? Machne Translaton: Phgh nds tonte > Plarge nds tonte Speech Recognton PI sa a van >> Peyes ae of an Spell Correcton The offce s about ffteen mnuets from my house Pabout ffteen mnutes from > Pabout ffteen mnuets from + Summarzaton, queston-anserng, etc., etc.!!

10 Outlne Motvaton Task Defnton Probablty Estmaton Evaluaton Smoothng 0 Smple Interpolaton and Back-off Advanced Algorthms

11 Probablstc Language Modelng Goal: compute the probablty of a sentence or sequence of ords: PW = P, 2, 3, 4, 5 n Related task: probablty of an upcomng ord: P 5, 2, 3, 4 A model that computes ether of these: PW or P n, 2 n- s called a language model. Better: the grammar But language model or LM s standard

12 Ho to compute PW Ho to compute ths jont probablty: Pts, ater, s, so, transparent, that P ts ater s so transparent = Pts Pater ts Ps ts ater Pso ts ater s Ptransparent ts ater s so

13 Ho to estmate these probabltes Could e just count and dvde? Pthe ts ater s so transparent that = Countts ater s so transparent that the Countts ater s so transparent that No! Too many possble sentences! We ll never see enough data for estmatng these

14 Markov Assumpton Smplfyng assumpton: Pthe ts ater s so transparent that» Pthe that Or maybe Andre Markov Pthe ts ater s so transparent that» Pthe transparent that

15 Markov Assumpton In other ords, e approxmate each component n the product k n P P 2 2 k P P

16 Smplest Case: Ungram Models Smplest case: ungrams P 2 n P Generatve process: pck a ord, pck a ord, untl you pck </s> Graphcal model: Examples: ffth, an, of, futures, the, an, ncorporated, a, a, the, nflaton, most, dollars, quarter, n, s, mass thrft, dd, eghty, sad, hard, 'm, july, bullsh that, or, lmted, the 2 n- </s>. Bg problem th ungrams: Pthe the the the >> PI lke ce cream!

17 Bgram Models Condtoned on prevous sngle ord P P 2 Generatve process: pck <s>, pck a ord condtoned on prevous one, repeat untl to pck </s> Graphcal model: <s> 2 n- </s> Examples: texaco, rose, one, n, ths, ssue, s, pursung, groth, n, a, boler, house, sad, mr., gurra, mexco, 's, moton, control, proposal, thout, permsson, from, fve, hundred, ffty, fve, yen outsde, ne, car, parkng, lot, of, the, agreement, reached ths, ould, be, a, record, november

18 N-Gram Models We can extend to trgrams, 4-grams, 5-grams N-gram models are eghted regular languages Many lngustc arguments that language sn t regular. Long-dstance effects: The computer hch I had just put nto the machne room on the ffth floor. Recursve structure We often get aay th n-gram models PCFG LM later: [Ths, quarter, s, surprsngly, ndependent, attack, pad, off, the, rsk, nvolvng, IRS, leaders, and, transportaton, prces,.] [It, could, be, announced, sometme,.] [Mr., Toseland, beleves, the, average, defense, economy, s, drafted, from, slghtly, more, than, 2, stocks,.]

19 Proof: Ungram LMs are a Well Defned Dstrbutons* Smplest case: ungrams Generatve process: pck a ord, pck a ord, untl you pck </s> For all strngs x of any length: px 0 Clam: the sum over strng of all lengths s : Σ x px = Step : decompose sum over length pn s prob. of sent. th n ords Step 2: For each length, nner sum s Step 3: For stoppng prob. p s =P</s>, e get a geometrc seres

20 Outlne Motvaton Task Defnton Probablty Estmaton Evaluaton Smoothng 26 Smple Interpolaton and Back-off Advanced Algorthms

21 Estmatng bgram probabltes The Maxmum Lkelhood Estmate P - = count -, count - P - = c -, c -

22 An example P - = c -, c - <s> I am Sam </s> <s> Sam I am </s> <s> I do not lke green eggs and ham </s>

23 More examples: Berkeley Restaurant Project sentences can you tell me about any good cantonese restaurants close by md prced tha food s hat m lookng for tell me about chez pansse can you gve me a lstng of the knds of food that are avalable m lookng for a good place to eat breakfast hen s caffe veneza open durng the day

24 Ra bgram counts Out of 9222 sentences

25 Ra bgram probabltes Normalze by ungrams: Result:

26 Bgram estmates of sentence probabltes P<s> I ant englsh food </s> = PI <s> Pant I Penglsh ant Pfood englsh P</s> food =.00003

27 What knds of knoledge? Penglsh ant =.00 Pchnese ant =.0065 Pto ant =.66 Peat to =.28 Pfood to = 0 Pant spend = 0 P <s> =.25 World knoledge Grammatcal knoledge

28 Practcal Issues We do everythng n log space Avod underflo also addng s faster than multplyng logp p 2 p 3 p 4 = log p + log p 2 + log p 3 + log p 4

29 Language Modelng Toolkts SRILM

30 Google N-Gram Release, August 2006

31 Google N-Gram Release serve as the ncomng 92 serve as the ncubator 99 serve as the ndependent 794 serve as the ndex 223 serve as the ndcaton 72 serve as the ndcator 20 serve as the ndcators 45 serve as the ndspensable serve as the ndspensble 40 serve as the ndvdual 234

32 Google Book N-grams

33 When do Noun Phrases start appearng regularly The Space n of text? Unlnkable Noun Phrases South Non-entty Amerca Noun 762 Phrases strange thngs 794 Serous change 82 Abraham Lncoln 849 Typcal ones 86 hypothess: In a set of apple juce 865 unlnkable noun phrases, Unlnkable enttes Computer Scence 943 phrases gong back 00s Bradley Hosptal 976 of years are more often heatgrass juce 983 non-entty. FY Hurrcane Rta Year source: Google Books ngrams

34 usage frequency n text Google Books n-grams Temporally-Aare features example non-entty: Some people slope of best ft lne year Year example entty: Starbucks Coffee usage frequency n text Google contnuous usage snce <year> Books 40 n-grams year

35 Outlne Motvaton Task Defnton Probablty Estmaton Evaluaton Smoothng 4 Smple Interpolaton and Back-off Advanced Algorthms

36 Evaluaton: Ho good s our model? Does our language model prefer good sentences to bad ones? Assgn hgher probablty to real or frequently observed sentences Than ungrammatcal or rarely observed sentences? We tran parameters of our model on a tranng set. We test the model s performance on data e haven t seen. A test set s an unseen dataset that s dfferent from our tranng set, totally unused. An evaluaton metrc tells us ho ell our model does on the test set.

37 Extrnsc evaluaton of N-gram models Best evaluaton for comparng models A and B Put each model n a task spellng corrector, speech recognzer, MT system Run the task, get an accuracy for A and for B Ho many msspelled ords corrected properly Ho many ords translated correctly Compare accuracy for A and B

38 Dffculty of extrnsc n-vvo evaluaton of N- gram models Extrnsc evaluaton Tme-consumng; can take days or eeks So Sometmes use ntrnsc evaluaton: perplexty Bad approxmaton unless the test data looks just lke the tranng data So generally only useful n plot experments But s helpful to thnk about.

39 Intuton of Perplexty The Shannon Game: Ho ell can e predct the next ord? I alays order pzza th cheese and The 33 rd Presdent of the US as I sa a Ungrams are terrble at ths game. Why? A better model of a text mushrooms 0. pepperon 0. anchoves 0.0 s one hch assgns a hgher probablty to the ord that actually occurs. fred rce and e-00

40 Perplexty The best language model s one that best predcts an unseen test set Gves the hghest Psentence Perplexty s the nverse probablty of the test set, normalzed by the number of ords: PPW = P 2... N - N = N P 2... N Chan rule: For bgrams: Mnmzng perplexty s the same as maxmzng probablty

41 The Shannon Game ntuton for perplexty From Josh Goodman Ho hard s the task of recognzng dgts 0,,2,3,4,5,6,7,8,9 Perplexty 0 Ho hard s recognzng 30,000 names at Mcrosoft. Perplexty = 30,000 If a system has to recognze Operator n 4 Sales n 4 Techncal Support n 4 30,000 names n 20,000 each Perplexty s 53 Perplexty s eghted equvalent branchng factor

42 Perplexty as branchng factor Let s suppose a sentence consstng of random dgts What s the perplexty of ths sentence accordng to a model that assgn P=/0 to each dgt?

43 Another form of Perplexty Loer s better! Example: unform model perplexty s N Interpretaton: effectve vocabulary sze accountng for statstcal regulartes Typcal values for nespaper text: Unform: 20,000; Ungram: 000s, Bgram: , Trgram: Important note: Its easy to get bogus perplextes by havng bogus probabltes that sum to more than one over ther event spaces. Be careful!

44 Loer perplexty = better model Tranng 38 mllon ords, test.5 mllon ords, WSJ N-gram Order Ungram Bgram Trgram Perplexty

45 Outlne Motvaton Task Defnton Probablty Estmaton Evaluaton Smoothng 5 Smple Interpolaton and Back-off Advanced Algorthms

46 The Shannon Vsualzaton Method Choose a random bgram <s>, accordng to ts probablty No choose a random bgram, x accordng to ts probablty And so on untl e choose </s> Then strng the ords together <s> I I ant ant to to eat eat Chnese Chnese food food </s> I ant to eat Chnese food

47 Approxmatng Shakespeare

48 Shakespeare as corpus N=884,647 tokens, V=29,066 Shakespeare produced 300,000 bgram types out of V 2 = 844 mllon possble bgrams. So 99.96% of the possble bgrams ere never seen have zero entres n the table Quadrgrams orse: What's comng out looks lke Shakespeare because t s Shakespeare

49 The all street journal s not shakespeare no offense

50 The perls of overfttng N-grams only ork ell for ord predcton f the test corpus looks lke the tranng corpus In real lfe, t often doesn t We need to tran robust models that generalze! One knd of generalzaton: Zeros! Thngs that don t ever occur n the tranng set But occur n the test set

51 Zeros Tranng set: dened the allegatons dened the reports dened the clams dened the request Test set dened the offer dened the loan P offer dened the = 0

52 Zero probablty bgrams Bgrams th zero probablty mean that e ll assgn 0 probablty to the test set! And hence e cannot compute perplexty can t dvde by 0!

53 allegatons reports clams request attack man outcome allegatons reports clams request attack man outcome The ntuton of smoothng When e have sparse statstcs: P dened the 3 allegatons 2 reports clams request 7 total Steal probablty mass to generalze better P dened the 2.5 allegatons.5 reports 0.5 clams 0.5 request 2 other 7 total

54 Add-one estmaton Also called Laplace smoothng Pretend e sa each ord one more tme than e dd Just add one to all the counts! MLE estmate: P MLE - = c -, c - Add- estmate: P Add- - = c -, + c - +V

55 Berkeley Restaurant Corpus: Laplace smoothed bgram counts

56 Laplace-smoothed bgrams

57 Reconsttuted counts

58 Compare th ra bgram counts

59 More general formulatons: Add-k P Add-k - = c -, + k c - + kv P Add-k - = c -, + m V c - + m

60 Outlne Motvaton Task Defnton Probablty Estmaton Evaluaton Smoothng 67 Smple Interpolaton and Back-off Advanced Algorthms

61 Backoff and Interpolaton Sometmes t helps to use less context Condton on less context for contexts you haven t learned much about Backoff: use trgram f you have good evdence, otherse bgram, otherse ungram Interpolaton: mx ungram, bgram, trgram Interpolaton often orks better

62 Backoff Defne the ords nto seen and unseen Backoff Problem? Not a probablty dstrbuton 69, BO B P Α c c P

63 Katz Backoff Defne the ords nto seen and unseen Backoff, ML c c P * B A P P * BO B P Α P P * ML P P }, : { k v c v A }, : { k v c v B

64 Lnear Interpolaton Smple nterpolaton Lambdas condtonal on context:

65 Ho to set the lambdas? Use a held-out corpus Tranng Data Held-Out Data Test Data Choose λs to maxmze the probablty of held-out data: Fx the N-gram probabltes on the tranng data Then search for λs that gve largest probablty to held-out set: å logp... n Ml...l k = logp M l...l k -

66 Add-k + Interpolaton P Add-k - = c -, + m V c - + m P UngramPror - = c -, + mp c - + m

67 Unknon ords: Open vs closed vocabulary tasks If e kno all the ords n advanced Vocabulary V s fxed Closed vocabulary task Often e don t kno ths Out Of Vocabulary = OOV ords Open vocabulary task Instead: create an unknon ord token <UNK> Tranng of <UNK> probabltes Create a fxed lexcon L of sze V At text normalzaton phase, any tranng ord not n L changed to <UNK> No e tran ts probabltes lke a normal ord At decodng tme If text nput: Use UNK probabltes for any ord not n tranng

68 Huge eb-scale n-grams Ho to deal th, e.g., Google N-gram corpus Prunng Only store N-grams th count > threshold. Remove sngletons of hgher-order n-grams Entropy-based prunng Effcency Effcent data structures lke tres Bloom flters: approxmate language models Store ords as ndexes, not strngs Use Huffman codng to ft large numbers of ords nto to bytes Quantze probabltes 4-8 bts nstead of 8-byte float

69 Smoothng for Web-scale N-grams Stupd backoff Brants et al No dscountng, just use relatve frequences 76 otherse count f... count... count... 2 k k k k k S S S = count N

70 Outlne Motvaton Task Defnton Probablty Estmaton Evaluaton Smoothng 77 Smple Interpolaton and Back-off Advanced Algorthms

71 Advanced smoothng algorthms Many advanced smoothng algorthms Good-Turng Kneser-Ney Wtten-Bell Use the count of thngs e ve seen once to help estmate the count of thngs e ve never seen

72 Notaton: N c = Frequency of frequency c N c = the count of thngs e ve seen c tmes Sam I am I am Sam I do not eat I 3 sam 2 am 2 do not eat 79 N = 3 N 2 = 2 N 3 =

73 Good-Turng smoothng ntuton You ask people ther favorte game a scenaro nspred from Josh Goodman, and fnd: 0 crcket, 3 tenns, 2 soccer, carrom, chess, rummy = 8 games Ho lkely s t that next anser s carrom? /8 Ho lkely s t that next game s ne.e. staapu or TT Let s use our estmate of thngs-e-sa-once to estmate the ne thngs. 3/8 because N =3 Assumng so, ho lkely s t that next game s carrom? Must be less than /8 Ho to estmate?

74 Good Turng calculatons P GT * thngs th zero frequency = N N c* = c +N c+ N c Unseen staapu or TT c = 0: MLE p = 0/8 = 0 Seen once carrom c = MLE p = /8 P * GT unseen = N /N = 3/8 C*carrom = 2 * N2/N = 2 * /3 = 2/3 P * GTcarrom = 2/3 / 8 = /27

75 Ney et al. s Good Turng Intuton H. Ney, U. Essen, and R. Kneser, 995. On the estmaton of 'small' probabltes by leavng-one-out. IEEE Trans. PAMI. 7:2, Held-out ords: 82

76 Ney et al. Good Turng Intuton Intuton from leave-one-out valdaton Take each of the T tranng ords out n turn T [tranng sets of sze T, held-out of sze ] What fracton of held-out ords are unseen n tranng? N /T What fracton of held-out ords are seen c tmes n tranng? c+n c+ /T So n the future e expect c+n c+ /T of the ords to be those th tranng count c There are N c ords th tranng count c Each specfc ord should occur th probablty: Tranng N N 2 N 3 Held out N 0 N N 2 c+n c+ /T/N c or expected count: c N N c c* c N 35 N 447 N 350 N 446

77 Good-Turng complcatons Problem: hat about the? c=447 For small c, N c > N c+ For large c, too jumpy, zeros reck estmates N N 2 N3 Smple Good-Turng [Gale and Sampson]: replace emprcal N c th a best-ft poer la once counts get unrelable N N 2

78 Resultng Good-Turng numbers Numbers from Church and Gale mllon ords of AP Nesre c* = c +N c+ N c Count Good Turng c* c

79 Resultng Good-Turng numbers Numbers from Church and Gale mllon ords of AP Nesre c* = c +N c+ N c It sure looks lke c* = c -.75 Count Good Turng c* c

80 Absolute Dscountng Save ourselves some tme and just subtract 0.75 or some d! Maybe keepng a couple extra values of d for counts and 2 Rarely used by tself 87, AbsoluteDscountng c d c P dscounted bgram

81 Absolute Dscountng th Backoff Defne the ords nto seen and unseen Backoff Can consder herarchcal formulatons: trgram s recursvely backed off to Katz bgram estmate, etc Can also have multple count thresholds nstead of just 0 and >0, AD c d c P c d c,, KB B B P P Α c d c P

82 Absolute Dscountng Interpolaton But should e really just use the regular ungram P? 89, scountng AbsoluteD P c d c P dscounted bgram ungram Interpolaton eght

83 Kneser-Ney Smoothng I Better estmate for probabltes of loer-order ungrams! Shannon game: I can t see thout my readng? Francsco s more common than glasses but Francsco alays follos San Francsco glasses The ungram s useful exactly hen e haven t seen ths bgram! Instead of P: Ho lkely s P contnuaton : Ho lkely s to appear as a novel contnuaton? For each ord, count the number of bgram types t completes Every bgram type as a novel contnuaton the frst tme t as seen P CONTINUATION µ { - :c -, > 0}

84 Kneser-Ney Smoothng II Ho many tmes does appear as a novel contnuaton: P CONTINUATION µ { - :c -, > 0} Normalzed by the total number of ord bgram types { j-, j : c j-, j > 0} P CONTINUATION = { - : c -, > 0} { j-, j : c j-, j > 0}

85 Kneser-Ney Smoothng III Alternatve metaphor: The number of # of ord types seen to precede { - : c -, > 0} normalzed by the # of ords precedng all ords: P CONTINUATION = å ' { - : c -, > 0} {' - : c' -, ' > 0} A frequent ord Francsco occurrng n only one context San ll have a lo contnuaton probablty

86 Kneser-Ney Smoothng IV P KN - = maxc -, - d,0 c - + l - P CONTINUATION λ s a normalzng constant; the probablty mass e ve dscounted l - = d c - { : c -, > 0} 93 the normalzed dscount The number of ord types that can follo - = # of ord types e dscounted = # of tmes e appled normalzed dscount

87 Kneser-Ney Smoothng: Recursve formulaton - P KN -n+ = maxc KN -n+ - c KN -n+ - d, l -n+ P KN -n+2 c KN = ìï í îï count for the hghest order contnuatoncount for loer order Contnuaton count = Number of unque sngle ord contexts for 94

88 What Actually Works? Trgrams and beyond: Ungrams, bgrams generally useless Trgrams much better hen there s enough data 4-, 5-grams really useful n MT, but not so much for speech Dscountng Absolute dscountng, Good- Turng, held-out estmaton, Wtten-Bell, etc See [Chen+Goodman] readng for tons of graphs [Graphs from Joshua Goodman]

89 N-gram Smoothng Summary Add- smoothng: OK for text categorzaton, not for language modelng The most commonly used method: Interpolated Kneser-Ney modfed to three dscounts For very large N-grams lke the Web: Stupd backoff 96

90 Data vs. Method? Havng more data s better but so s usng a better estmator Another ssue: N > 3 has huge costs n speech recognzers Entropy n-gram order 00,000 Katz 00,000 KN,000,000 Katz,000,000 KN 0,000,000 Katz 0,000,000 KN all Katz all KN

91 Tons of Data? Tons of data closes gap, for extrnsc MT evaluaton

92 Beyond N-Gram LMs Lots of deas e on t have tme to dscuss: Cachng models: recent ords more lkely to appear agan Trgger models: recent ords trgger other ords Topc models A fe recent deas Syntactc models: use tree models to capture long-dstance syntactc effects [Chelba and Jelnek, 98] Dscrmnatve models: set n-gram eghts to mprove fnal task accuracy rather than ft tranng set densty [Roark, 05, for ASR; Lang et. al., 06, for MT] Structural zeros: some n-grams are syntactcally forbdden, keep estmates at zero [Mohr and Roark, 06] Bayesan document and IR models [Daume 06]

9/10/18. CS 6120/CS4120: Natural Language Processing. Outline. Probabilistic Language Models. Probabilistic Language Modeling

9/10/18. CS 6120/CS4120: Natural Language Processing. Outline. Probabilistic Language Models. Probabilistic Language Modeling Outlne CS 6120/CS4120: Natural Language Processng Instructor: Prof. Lu Wang College of Computer and Informaton Scence Northeastern Unversty Webpage: www.ccs.neu.edu/home/luwang Probablstc language model

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Outline Probabilistic language

More information

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduc*on to N- grams Many Slides are adapted from slides by Dan Jurafsky Probabilis1c Language Models Today s goal: assign a probability to a sentence Machine Transla*on: Why? P(high

More information

Language Modeling. Introduction to N-grams. Klinton Bicknell. (borrowing from: Dan Jurafsky and Jim Martin)

Language Modeling. Introduction to N-grams. Klinton Bicknell. (borrowing from: Dan Jurafsky and Jim Martin) Language Modeling Introduction to N-grams Klinton Bicknell (borrowing from: Dan Jurafsky and Jim Martin) Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation:

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Today s Outline Probabilistic

More information

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduc*on to N- grams. Many Slides are adapted from slides by Dan Jurafsky Language Modeling Introduc*on to N- grams Many Slides are adapted from slides by Dan Jurafsky Probabilis1c Language Models Today s goal: assign a probability to a sentence Machine Transla*on: Why? P(high

More information

Language Modeling. Introduction to N- grams

Language Modeling. Introduction to N- grams Language Modeling Introduction to N- grams Probabilistic Language Models Today s goal: assign a probability to a sentence Machine Translation: P(high winds tonite) > P(large winds tonite) Why? Spell Correction

More information

Language Modeling. Introduction to N- grams

Language Modeling. Introduction to N- grams Language Modeling Introduction to N- grams Probabilistic Language Models Today s goal: assign a probability to a sentence Machine Translation: P(high winds tonite) > P(large winds tonite) Why? Spell Correction

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer + Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward

More information

Information Retrieval Language models for IR

Information Retrieval Language models for IR Informaton Retreval Language models for IR From Mannng and Raghavan s course [Borros sldes from Vktor Lavrenko and Chengxang Zha] 1 Recap Tradtonal models Boolean model Vector space model robablstc models

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Introduction to N-grams

Introduction to N-grams Lecture Notes, Artificial Intelligence Course, University of Birzeit, Palestine Spring Semester, 2014 (Advanced/) Artificial Intelligence Probabilistic Language Modeling Introduction to N-grams Dr. Mustafa

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II Statstcal NLP Sprng 2010 Feature-Rch Sequence Models Problem: HMMs make t hard to work wth arbtrary features of a sentence Example: name entty recognton (NER) PER PER O O O O O O ORG O O O O O LOC LOC

More information

Language Model. Introduction to N-grams

Language Model. Introduction to N-grams Language Model Introduction to N-grams Probabilistic Language Model Goal: assign a probability to a sentence Application: Machine Translation P(high winds tonight) > P(large winds tonight) Spelling Correction

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

language modeling: n-gram models

language modeling: n-gram models language modeling: n-gram models CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Question Classification Using Language Modeling

Question Classification Using Language Modeling Queston Classfcaton Usng Language Modelng We L Center for Intellgent Informaton Retreval Department of Computer Scence Unversty of Massachusetts, Amherst, MA 01003 ABSTRACT Queston classfcaton assgns a

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,

More information

Maxent Models & Deep Learning

Maxent Models & Deep Learning Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

} Often, when learning, we deal with uncertainty:

} Often, when learning, we deal with uncertainty: Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally

More information

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere Fall Analyss of Epermental Measurements B. Esensten/rev. S. Errede Some mportant probablty dstrbutons: Unform Bnomal Posson Gaussan/ormal The Unform dstrbuton s often called U( a, b ), hch stands for unform

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing. Statistical Inference: n-grams Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Lecture 6 Hidden Markov Models and Maximum Entropy Models

Lecture 6 Hidden Markov Models and Maximum Entropy Models Lecture 6 Hdden Markov Models and Maxmum Entropy Models CS 6320 82 HMM Outlne Markov Chans Hdden Markov Model Lkelhood: Forard Alg. Decodng: Vterb Alg. Maxmum Entropy Models 83 Dentons A eghted nte-state

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Hidden Markov Models

Hidden Markov Models Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

We have a sitting situation

We have a sitting situation We have a sitting situation 447 enrollment: 67 out of 64 547 enrollment: 10 out of 10 2 special approved cases for audits ---------------------------------------- 67 + 10 + 2 = 79 students in the class!

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

ML4NLP Introduction to Classification

ML4NLP Introduction to Classification ML4NLP Introducton to Classfcaton CS 590NLP Dan Goldwasser Purdue Unversty dgoldwas@purdue.edu Statstcal Language Modelng Intuton: by lookng at large quanttes of text we can fnd statstcal regulartes Dstngush

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Lecture 6 More on Complete Randomized Block Design (RBD)

Lecture 6 More on Complete Randomized Block Design (RBD) Lecture 6 More on Complete Randomzed Block Desgn (RBD) Multple test Multple test The multple comparsons or multple testng problem occurs when one consders a set of statstcal nferences smultaneously. For

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

15-381: Artificial Intelligence. Regression and cross validation

15-381: Artificial Intelligence. Regression and cross validation 15-381: Artfcal Intellgence Regresson and cross valdaton Where e are Inputs Densty Estmator Probablty Inputs Classfer Predct category Inputs Regressor Predct real no. Today Lnear regresson Gven an nput

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Smoothed marginal distribution constraints for language modeling

Smoothed marginal distribution constraints for language modeling Smoothed margnal dstrbuton constrants for language modelng Bran Roark, Cyrl Allauzen and Mchael Rley Oregon Health & Scence Unversty, Portland, Oregon Google, Inc., New York roarkbr@gmal.com, {allauzen,rley}@google.com

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a

More information

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger JAB Chan Long-tal clams development ASTIN - September 2005 B.Verder A. Klnger Outlne Chan Ladder : comments A frst soluton: Munch Chan Ladder JAB Chan Chan Ladder: Comments Black lne: average pad to ncurred

More information

Hopfield Training Rules 1 N

Hopfield Training Rules 1 N Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Central tendency. mean for metric data. The mean. "I say what I means and I means what I say!."

Central tendency. mean for metric data. The mean. I say what I means and I means what I say!. Central tendency "I say hat I means and I means hat I say!." Popeye Normal dstrbuton vdeo clp To ve an unedted verson vst: http://.learner.org/resources/seres65.html# mean for metrc data mportant propertes

More information

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto CSC41/2511 Natural Language Computng Sprng 219 Lecture 5 Frank Rudzcz and Chloé Pou-Prom Unversty of Toronto Defnton of an HMM θ A hdden Markov model (HMM) s specfed by the 5-tuple {S, W, Π, A, B}: S =

More information

Large-Margin HMM Estimation for Speech Recognition

Large-Margin HMM Estimation for Speech Recognition Large-Margn HMM Estmaton for Speech Recognton Prof. Hu Jang Department of Computer Scence and Engneerng York Unversty, Toronto, Ont. M3J 1P3, CANADA Emal: hj@cs.yorku.ca Ths s a jont work wth Chao-Jun

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Midterm Examination. Regression and Forecasting Models

Midterm Examination. Regression and Forecasting Models IOMS Department Regresson and Forecastng Models Professor Wllam Greene Phone: 22.998.0876 Offce: KMC 7-90 Home page: people.stern.nyu.edu/wgreene Emal: wgreene@stern.nyu.edu Course web page: people.stern.nyu.edu/wgreene/regresson/outlne.htm

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

Hidden Markov Models. Hongxin Zhang State Key Lab of CAD&CG, ZJU

Hidden Markov Models. Hongxin Zhang State Key Lab of CAD&CG, ZJU Hdden Markov Models Hongxn Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 00-03-5 utlne Background Markov Chans Hdden Markov Models Example: Vdeo extures Problem statement vdeo clp vdeo texture

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Queueing Networks II Network Performance

Queueing Networks II Network Performance Queueng Networks II Network Performance Davd Tpper Assocate Professor Graduate Telecommuncatons and Networkng Program Unversty of Pttsburgh Sldes 6 Networks of Queues Many communcaton systems must be modeled

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Meshless Surfaces. presented by Niloy J. Mitra. An Nguyen

Meshless Surfaces. presented by Niloy J. Mitra. An Nguyen Meshless Surfaces presented by Nloy J. Mtra An Nguyen Outlne Mesh-Independent Surface Interpolaton D. Levn Outlne Mesh-Independent Surface Interpolaton D. Levn Pont Set Surfaces M. Alexa, J. Behr, D. Cohen-Or,

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press.

This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press. Ths excerpt from Foundatons of Statstcal Natural Language Processng. Chrstopher D. Mannng and Hnrch Schütze. 1999 The MIT Press. s provded n screen-vewable form for personal use only by members of MIT

More information