Maxent Models and Discrimina1ve Es1ma1on

Size: px
Start display at page:

Download "Maxent Models and Discrimina1ve Es1ma1on"

Transcription

1 Maxent Models and Dscrmna1ve Es1ma1on Genera&ve vs. Dscrmna&ve models Chrstopher Mannng

2 Chrstopher Mannng Introduc1on So far we ve looked at genera&ve models Language models Nave Bayes But there s now much use of cond&onal or dscrmna&ve probabls&c models n NLP Speech IR and ML generally Because: They gve hgh accuracy performance They make t easy to ncorporate lots of lngus&cally mportant features They allow automa&c buldng of language ndependent retargetable NLP modules

3 Chrstopher Mannng Jont vs. Cond1onal Models We have some data {d c} of pared observa&ons d and hdden classes c. Jont genera&ve models place probabl&es over both observed data and the hdden stuff gene- rate the observed data from hdden stuff: All the classc StatNLP models: n- gram models Nave Bayes classfers hdden Markov models probabls&c context- free grammars IBM machne transla&on algnment models Pcd

4 Chrstopher Mannng Jont vs. Cond1onal Models Dscrmna&ve cond&onal models take the data as gven and put a probablty over hdden structure gven the data: Logs&c regresson cond&onal loglnear or maxmum entropy models cond&onal random felds Also SVMs averaged perceptron etc. are dscrmna&ve classfers but not drectly probabls&c Pc d

5 Chrstopher Mannng Bayes Net/Graphcal Models Bayes net dagrams draw crcles for random varables and lnes for drect dependences Some varables are observed; some are hdden Each node s a lyle classfer cond&onal probablty table based on ncomng arcs c c d 1 d 2 d 3 Nave Bayes Genera&ve d 1 d 2 d 3 Logs&c Regresson Dscrmna&ve

6 Chrstopher Mannng Cond1onal vs. Jont Lkelhood A jont model gves probabl&es Pdc and tres to maxmze ths jont lkelhood. It turns out to be trval to choose weghts: just rela&ve frequences. A condonal model gves probabl&es Pc d. It takes the data as gven and models only the cond&onal probablty of the class. We seek to maxmze cond&onal lkelhood. Harder to do as we ll see More closely related to classfca&on error.

7 Chrstopher Mannng Cond1onal models work well: Word Sense Dsambgua1on Tranng Set Objectve Accuracy Jont Lke Cond. Lke Even wth exactly the same features changng from jont to cond&onal es&ma&on ncreases performance Objectve Test Set Accuracy Jont Lke Cond. Lke Klen and Mannng 2002 usng Senseval-1 Data That s we use the same smoothng and the same word- class features we just change the numbers parameters

8 Maxent Models and Dscrmna1ve Es1ma1on Genera&ve vs. Dscrmna&ve models Chrstopher Mannng

9 Dscrmna1ve Model Features Makng features from text for dscrmna&ve NLP models Chrstopher Mannng

10 Chrstopher Mannng Features In these sldes and most maxent work: features f are elementary peces of evdence that lnk aspects of what we observe d wth a category c that we want to predct A feature s a func&on wth a bounded real value: f: C D R

11 Chrstopher Mannng Features In these sldes and most maxent work: features f are elementary peces of evdence that lnk aspects of what we observe d wth a category c that we want to predct A feature s a func&on wth a bounded real value

12 Chrstopher Mannng Example features f 1 c d [c = LOCATION w -1 = n scaptalzedw] f 2 c d [c = LOCATION hasaccentedlatncharw] f 3 c d [c = DRUG endsw c ] LOCATION n Arcada LOCATION n Québec DRUG takng Zantac PERSON saw Sue Models wll assgn to each feature a weght: A pos&ve weght votes that ths confgura&on s lkely correct A nega&ve weght votes that ths confgura&on s lkely ncorrect

13 Chrstopher Mannng Example features f 1 c d [c = LOCATION w -1 = n scaptalzedw] f 2 c d [c = LOCATION hasaccentedlatncharw] f 3 c d [c = DRUG endsw c ] LOCATION n Arcada LOCATION n Québec DRUG takng Zantac PERSON saw Sue Models wll assgn to each feature a weght: A pos&ve weght votes that ths confgura&on s lkely correct A nega&ve weght votes that ths confgura&on s lkely ncorrect

14 Chrstopher Mannng Feature Expecta1ons We wll crucally make use of two expectaons actual or predcted counts of a feature frng: Emprcal count expecta&on of a feature: Model expecta&on of a feature: =! " observed emprcal D C d c d c f f E =! " D C d c d c f d c P f E

15 Chrstopher Mannng Features In NLP uses usually a feature specfes 1 an ndcator func&on a yes/no boolean matchng func&on of proper&es of the nput and 2 a par&cular class f c d [Φd c = c j ] [Value s 0 or 1] They pck out a data subset and suggest a label for t. We wll say that Φd s a feature of the data d when for each c j the conjunc&on Φd c = c j s a feature of the data- class par c d

16 Chrstopher Mannng Features In NLP uses usually a feature specfes 1. an ndcator func&on a yes/no boolean matchng func&on of proper&es of the nput and 2. a par&cular class f c d [Φd c = c j ] [Value s 0 or 1] Each feature pcks out a data subset and suggests a label for t

17 Chrstopher Mannng Feature- Based Models The decson about a data pont s based only on the features ac&ve at that pont. Data BUSINESS: Stocks ht a yearly low Data to restructure bank:money debt. Data DT JJ NN The prevous fall Label: BUSINESS Features { stocks ht a yearly low } Text Categorzaton Label: MONEY Features { w -1 =restructure w +1 =debt L=12 } Word-Sense Dsambguaton Label: NN Features {w=fall t -1 =JJ w -1 =prevous} POS Taggng

18 Chrstopher Mannng Example: Text Categorza1on Zhang and Oles 2001 Features are presence of each word n a document and the document class they do feature selec&on to use relable ndcator words Tests on classc Reuters data set and others Naïve Bayes: 77.0% F 1 Lnear regresson: 86.0% Logs&c regresson: 86.4% Support vector machne: 86.5% Paper emphaszes the mportance of regularzaon smoothng for successful use of dscrmna&ve methods not used n much early NLP/IR work

19 Chrstopher Mannng Other Maxent Classfer Examples You can use a maxent classfer whenever you want to assgn data ponts to one of a number of classes: Sentence boundary detec&on Mkheev 2000 Is a perod end of sentence or abbreva&on? Sen&ment analyss Pang and Lee 2002 Word ungrams bgrams POS counts PP ayachment Ratnaparkh 1998 AYach to verb or noun? Features of head noun prepos&on etc. Parsng decsons n general Ratnaparkh 1997; Johnson et al etc.

20 Dscrmna1ve Model Features Makng features from text for dscrmna&ve NLP models Chrstopher Mannng

21 Feature- based Lnear Classfers How to put features nto a classfer 21

22 Chrstopher Mannng Feature- Based Lnear Classfers Lnear classfers at classfca&on &me: Lnear functon from feature sets {f } to classes {c}. Assgn a weght λ to each feature f. We consder each class for an observed datum d For a par cd features vote wth ther weghts: votec = Σλ f cd PERSON n Québec LOCATION n Québec DRUG n Québec Choose the class c whch maxmzes Σλ f cd

23 Chrstopher Mannng Feature- Based Lnear Classfers Lnear classfers at classfca&on &me: Lnear functon from feature sets {f } to classes {c}. Assgn a weght λ to each feature f. We consder each class for an observed datum d For a par cd features vote wth ther weghts: votec = Σλ f cd PERSON n Québec LOCATION n Québec 0.3 DRUG n Québec Choose the class c whch maxmzes Σλ f cd = LOCATION

24 Chrstopher Mannng Feature- Based Lnear Classfers There are many ways to chose weghts for features Perceptron: fnd a currently msclassfed example and nudge weghts n the drec&on of ts correct classfca&on Margn- based methods Support Vector Machnes

25 Chrstopher Mannng Feature- Based Lnear Classfers Exponen&al log- lnear maxent logs&c Gbbs models: Make a probabls&c model from the lnear combna&on Σλ f cd P c d! =! exp! c'! exp " f c d Makes votes postve " f c' d PLOCATION n Québec = e 1.8 e 0.6 /e 1.8 e e e 0 = PDRUG n Québec = e 0.3 /e 1.8 e e e 0 = PPERSON n Québec = e 0 /e 1.8 e e e 0 = The weghts are the parameters of the probablty model combned va a soft max functon Normalzes votes

26 Chrstopher Mannng Feature- Based Lnear Classfers Exponen&al log- lnear maxent logs&c Gbbs models: Gven ths model form we wll choose parameters {λ } that maxmze the condonal lkelhood of the data accordng to ths model. We construct not only classfca&ons but probablty dstrbu&ons over classfca&ons. There are other good! ways of dscrmna&ng classes SVMs boos&ng even perceptrons but these methods are not as trval to nterpret as dstrbu&ons over classes.

27 Chrstopher Mannng Asde: logs1c regresson Maxent models n NLP are essen&ally the same as mul&class logs&c regresson models n sta&s&cs or machne learnng 27 If you haven t seen these before don t worry ths presenta&on s self- contaned! If you have seen these before you mght thnk about: The parameterza&on s slghtly dfferent n a way that s advantageous for NLP- style models wth tons of sparse features but sta&s&cally nelegant The key role of feature func&ons n NLP and n ths presenta&on The features are more general wth f also beng a func&on of the class when mght ths be useful?

28 Chrstopher Mannng Quz Queston Assumng exactly the same set up 3 class decson: LOCATION PERSON or DRUG; 3 features as before maxent what are: PPERSON by Goérc = PLOCATION by Goérc = PDRUG by Goérc = 1.8 f 1 c d [c = LOCATION w -1 = n scaptalzedw] -0.6 f 2 c d [c = LOCATION hasaccentedlatncharw] 0.3 f 3 c d [c = DRUG endsw c ] PERSON by Goérc LOCATION by Goérc DRUG by Goérc P c d! =! exp! c'! exp " f c d " f c' d

29 Feature- based Lnear Classfers How to put features nto a classfer 29

30 Buldng a Maxent Model The nuts and bolts

31 Chrstopher Mannng Buldng a Maxent Model We defne features ndcator func&ons over data ponts Features represent sets of data ponts whch are ds&nc&ve enough to deserve model parameters. Words but also word contans number word ends wth ng etc. We wll smply encode each Φ feature as a unque Strng A datum wll gve rse to a set of Strngs: the ac&ve Φ features Each feature f c d [Φd c = c j ] gets a real number weght We concentrate on Φ features but the math uses ndces of f

32 Chrstopher Mannng Buldng a Maxent Model Features are oven added durng model development to target errors Oven the easest thng to thnk of are features that mark bad combna&ons Then for any gven feature weghts we want to be able to calculate: Data cond&onal lkelhood Derva&ve of the lkelhood wrt each feature weght Uses expecta&ons of each feature accordng to the model We can then fnd the op&mum feature weghts dscussed later.

33 Buldng a Maxent Model The nuts and bolts

34 Nave Bayes vs. Maxent models Genera&ve vs. Dscrmna&ve models: The problem of overcoun&ng evdence Chrstopher Mannng

35 Chrstopher Mannng Europe Text classfca1on: Asa or Europe Tranng Data Asa Monaco Monaco Monaco Monaco Monaco Monaco Hong Kong Hong Kong Monaco Monaco Hong Kong Hong Kong NB Model Class X 1 =M NB FACTORS: PA = PE = PM A = PM E = PREDICTIONS: PAM = PEM = PA M = PE M =

36 Chrstopher Mannng Europe Text classfca1on: Asa or Europe Tranng Data Asa Monaco Monaco Monaco Monaco Monaco Monaco Hong Kong Hong Kong Monaco Monaco Hong Kong Hong Kong NB Model Class X 1 =H X 2 =K NB FACTORS: PA = PE = PH A = PK A = PH E = PK E = PREDICTIONS: PAHK = PEHK = PA HK = PE HK =

37 Chrstopher Mannng Europe Text classfca1on: Asa or Europe Tranng Data Asa Monaco Monaco Monaco Monaco Monaco Monaco Hong Kong Hong Kong Monaco Monaco Hong Kong Hong Kong NB Model Class H K M NB FACTORS: PA = PE = PM A = PM E = PH A = PK A = PH E = PK E = PREDICTIONS: PAHKM = PEHKM = PA HKM = PE HKM =

38 Chrstopher Mannng Nave Bayes vs. Maxent Models Nave Bayes models mul&- count correlated evdence Each feature s mul&pled n even when you have mul&ple features tellng you the same thng Maxmum Entropy models preyy much solve ths problem As we wll see ths s done by wegh&ng features so that model expecta&ons match the observed emprcal expecta&ons

39 Nave Bayes vs. Maxent models Genera&ve vs. Dscrmna&ve models: The problem of overcoun&ng evdence Chrstopher Mannng

40 Maxent Models and Dscrmna1ve Es1ma1on Maxmzng the lkelhood

41 Chrstopher Mannng Exponen1al Model Lkelhood Maxmum Cond&onal Lkelhood Models : Gven a model form choose values of parameters to maxmze the cond&onal lkelhood of the data.!! " " = = log log log D C d c D C d c d c P D C P # #!! ' ' exp c d c f "! d c f exp "

42 Chrstopher Mannng The Lkelhood Value The log cond&onal lkelhood of d data CD accordng to maxent model s a func&on of the data and the parameters λ: If there aren t many values of c t s easy to calculate:! " # # = = log log log D C d c D C d c d c P d c P D C P $ $ $! " = log log D C d c D C P #!! ' ' exp c d c f "! d c f exp "

43 Chrstopher Mannng The Lkelhood Value We can separate ths nto two components: log P C D! =! logexp! c d " C D # f c d! The derva&ve s the dfference between the derva&ves of each component log P C D! = N!! M!! log! exp! c d " C D c' # f c' d

44 Chrstopher Mannng The Derva1ve I: Numerator Derva&ve of the numerator s: the emprcal countf c D C d c d c f!! " " = # # $!! " # # = D C d c d c f $ $! " = D C d c d c f D C d c c d c f N!!!! " " = " " # # $ log exp

45 Chrstopher Mannng The Derva1ve II: Denomnator " # log# exp# " M! c d $ C D c' = "! "! = =! 1! f c' d! exp! c' c d " C D! exp! $ c'' d # c'' 1 # f $ exp! $ f c' d $ f c' d #!! c d " C D! exp! $ f c'' d c' 1 # $ c'' exp#! f c' d "#! f c' d # # c d $ C D c' # exp! f c d "! c # '' '' = =!! c d " C D c'! $ f c' d P c' d # f c' d = predcted countf λ

46 Chrstopher Mannng! log The Derva1ve III P C D"!" = actual count f C! predcted count! The op&mum parameters are the ones for whch each feature s predcted expecta&on equals ts emprcal expecta&on. The op&mum dstrbu&on s: Always unque but parameters may not be unque Always exsts f feature counts are from actual data. These models are also called maxmum entropy models because we fnd the model havng maxmum entropy and sa&sfyng the constrants: E p f j = E~ p f j! j f

47 Chrstopher Mannng Fndng the op1mal parameters We want to choose parameters λ 1 λ 2 λ 3 that maxmze the cond&onal log- lkelhood of the tranng data CLogLk D =! log P c d n = 1 To be able to do that we ve worked out how to calculate the func&on value and ts par&al derva&ves ts gradent

48 Chrstopher Mannng A lkelhood surface

49 Chrstopher Mannng Fndng the op1mal parameters Use your favorte numercal op&mza&on package. Commonly and n our code you mnmze the nega&ve of CLogLk 1. Gradent descent GD; Stochas&c gradent descent SGD 2. Itera&ve propor&onal f ng methods: Generalzed Itera&ve Scalng GIS and Improved Itera&ve Scalng IIS 3. Conjugate gradent CG perhaps wth precond&onng 4. Quas- Newton methods lmted memory varable metrc LMVM methods n par&cular L- BFGS

50 Maxent Models and Dscrmna1ve Es1ma1on Maxmzng the lkelhood

51 Maxent Models and Dscrmnatve Estmaton The maxmum entropy model presentaton

52 Chrstopher Mannng Maxmum Entropy Models An equvalent approach: Lots of dstrbutons out there most of them very spked specfc overft. We want a dstrbuton whch s unform except n specfc ways we requre. Unformty means hgh entropy we can search for dstrbutons whch have propertes we desre but also have hgh entropy. Ignorance s preferable to error and he s less remote from the truth who beleves nothng than he who beleves what s wrong Thomas Jefferson 1781

53 Chrstopher Mannng Maxmum Entropy Entropy: the uncertanty of a dstrbuton. Quantfyng uncertanty surprse : Event Probablty Surprse log1/p x Entropy: expected surprse over p: x p x H p HEADS A con-flp s most uncertan for a far con.

54 Chrstopher Mannng Maxent Examples I What do we want from a dstrbuton? Mnmze commtment = maxmze entropy. Resemble some reference dstrbuton data. Soluton: maxmze entropy H subject to feature-based constrants: Unconstraned max at 0.5 Addng constrants features: Lowers maxmum entropy Rases maxmum lkelhood of data Brngs the dstrbuton further from unform Brngs the dstrbuton closer to data Constrant that p HEADS = 0.3

55 Chrstopher Mannng Maxent Examples II Hp H p T p H + p T = 1 p H = 0.3 1/e - x log x

56 Chrstopher Mannng Maxent Examples III Let s say we have the followng event space: NN NNS NNP NNPS VBZ VBD and the followng emprcal data: Maxmze H: /e 1/e 1/e 1/e 1/e 1/e want probabltes: E[NNNNSNNPNNPSVBZVBD] = 1 1/6 1/6 1/6 1/6 1/6 1/6

57 Chrstopher Mannng Maxent Examples IV Too unform! N* are more common than V* so we add the feature f N = {NN NNS NNP NNPS} wth E[f N ] =32/36 NN NNS NNP NNPS VBZ VBD 8/36 8/36 8/36 8/36 2/36 2/36 and proper nouns are more frequent than common nouns so we add f P = {NNP NNPS} wth E[f P ] =24/36 4/36 4/36 12/36 12/36 2/36 2/36 we could keep refnng the models e.g. by addng a feature to dstngush sngular vs. plural nouns or verb types.

58 Chrstopher Mannng Convexty Convex Non-Convex Convexty guarantees a sngle global maxmum because any hgher ponts are greedly reachable.

59 Chrstopher Mannng Convexty II Constraned Hp = x log x s convex: x log x s convex x log x s convex sum of convex functons s convex. The feasble regon of constraned H s a lnear subspace whch s convex The constraned entropy surface s therefore convex. The maxmum lkelhood exponental model dual formulaton s also convex.

60 Feature Overlap/Feature Interacton How overlappng features work n maxent models

61 Chrstopher Mannng Feature Overlap Maxent models handle overlappng features well. Unlke a NB model there s no double countng! A a A a A a Emprcal B b B b B b A a B 2 1 b 2 1 All = 1 A a B 1/4 1/4 A = 2/3 A a B 1/3 1/6 A = 2/3 A a B 1/3 1/6 b 1/4 1/4 b 1/3 1/6 b 1/3 1/6 A a A a A a B B A B A + A b b A b A + A

62 Chrstopher Mannng Example: Named Entty Feature Overlap Grace s correlated wth PERSON but does not add much evdence on top of already knowng prefx features. Local Context Prev Cur Next State Other?????? Word at Grace Road Tag IN NNP NNP Sg x Xx Xx Feature Weghts Prevous word at Current word Grace Begnnng bgram <G Current POS tag NNP Prev and cur tags IN NNP Prevous state Other Current sgnature Xx Prev state cur sg O-Xx Prev-cur-next sg x-xx-xx P. state - p-cur sg O-x-Xx

63 Chrstopher Mannng Feature Interacton Maxent models handle overlappng features well but do not automatcally model feature nteractons. A a A a A a Emprcal B b B b B b A a B 1 1 b 1 0 All = 1 A a B 1/4 1/4 A = 2/3 A a B 1/3 1/6 B = 2/3 A a B 4/9 2/9 b 1/4 1/4 b 1/3 1/6 b 2/9 1/9 A a A a A a B 0 0 B A B A + B B b 0 0 b A b A

64 Chrstopher Mannng Feature Interacton If you want nteracton terms you have to add them: A a A a A a Emprcal B b B b B b A a B 1 1 b 1 0 A = 2/3 A a B 1/3 1/6 B = 2/3 A a B 4/9 2/9 AB = 1/3 A a B 1/3 1/3 b 1/3 1/6 b 2/9 1/9 b 1/3 0 A dsjunctve feature would also have done t alone: A a A a B B 1/3 1/3 b b 1/3 0

65 Chrstopher Mannng Suppose we have a 1 feature maxent model bult over observed data as shown. What s the constructed model s probablty dstrbuton over the four possble outcomes? Emprcal A a B 2 1 b 2 1 B b Features A a Expectatons Probabltes A a B b

66 Chrstopher Mannng Feature Interacton For loglnear/logstc regresson models n statstcs t s standard to do a greedy stepwse search over the space of all possble nteracton terms. Ths combnatoral space s exponental n sze but that s okay as most statstcs models only have 4 8 features. In NLP our models commonly use hundreds of thousands of features so that s not okay. Commonly nteracton terms are added by hand based on lngustc ntutons.

67 Chrstopher Mannng Example: NER Interacton Prevous-state and current-sgnature have nteractons e.g. P=PERS-C=Xx ndcates C=PERS much more strongly than C=Xx and P=PERS ndependently. Ths feature type allows the model to capture ths nteracton. Local Context Prev Cur Next State Other?????? Word at Grace Road Tag IN NNP NNP Sg x Xx Xx Feature Weghts Prevous word at Current word Grace Begnnng bgram <G Current POS tag NNP Prev and cur tags IN NNP Prevous state Other Current sgnature Xx Prev state cur sg O-Xx Prev-cur-next sg x-xx-xx P. state - p-cur sg O-x-Xx

Maxent Models and Discriminative Estimation

Maxent Models and Discriminative Estimation Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much

More information

Maxent Models & Deep Learning

Maxent Models & Deep Learning Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson

More information

Discriminative Estimation (Maxent models and perceptron)

Discriminative Estimation (Maxent models and perceptron) srmnatve Estmaton Maxent moels an pereptron Generatve vs. srmnatve moels Many sles are aapte rom sles by hrstopher Mannng Introuton So ar we ve looke at generatve moels Nave Bayes But there s now muh use

More information

Discrimina)ve Es)ma)on (Maxent models and perceptron)

Discrimina)ve Es)ma)on (Maxent models and perceptron) Dsrmnave Esmaon Maxent moels an pereptron Genera&ve vs. Dsrmna&ve moels Many sles are aapte from sles by Chrstopher Mannng an pereptron sles by Alan Rtter Introuon So far we ve looke at genera&ve moels

More information

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II

Feature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II Statstcal NLP Sprng 2010 Feature-Rch Sequence Models Problem: HMMs make t hard to work wth arbtrary features of a sentence Example: name entty recognton (NER) PER PER O O O O O O ORG O O O O O LOC LOC

More information

Maxent Models and Discriminative Estimation. Generative vs. Discriminative models

Maxent Models and Discriminative Estimation. Generative vs. Discriminative models + Maxent Moels an Dsrmnatve Estmaton Generatve vs. Dsrmnatve moels + Introuton n So far we ve looke at generatve moels n Language moels Nave Bayes 2 n But there s now muh use of ontonal or srmnatve probablst

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Manning & Schuetze, FSNLP (c)1999, 2001

Manning & Schuetze, FSNLP (c)1999, 2001 page 589 16.2 Maxmum Entropy Modelng 589 Mannng & Schuetze, FSNLP (c)1999, 2001 a decson tree that detects spam. Fndng the rght features s paramount for ths task, so desgn your feature set carefully. Exercse

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Machine Learning for Language Technology Lecture 8: Decision Trees and k- Nearest Neighbors

Machine Learning for Language Technology Lecture 8: Decision Trees and k- Nearest Neighbors Machne Learnng for Language Technology Lecture 8: Decson Trees and k- Nearest Neghbors Marna San:n Department of Lngus:cs and Phlology Uppsala Unversty, Uppsala, Sweden Autumn 2014 Acknowledgement: Thanks

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Log Linear Models for Text Classification

Log Linear Models for Text Classification Log Lnear Models for Tet Classfcaton Mausam Sldes b Mchael Collns Eml Fo Aleander Ihler Dan Jurafsk Dan Klen Chrs Mannng Ra Moone Mark Schmdt Dan Weld Ale Yates Luke Zettlemoer Introducton So far e ve

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Laboratory 1c: Method of Least Squares

Laboratory 1c: Method of Least Squares Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Probability Theory (revisited)

Probability Theory (revisited) Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

} Often, when learning, we deal with uncertainty:

} Often, when learning, we deal with uncertainty: Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 16: Genera,ve vs. Dscrmna,ve / K- nearest- neghbor Classfer / LOOCV Yanjun Q / Jane,, PhD Unversty of Vrgna Department of

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

18-660: Numerical Methods for Engineering Design and Optimization

18-660: Numerical Methods for Engineering Design and Optimization 8-66: Numercal Methods for Engneerng Desgn and Optmzaton n L Department of EE arnege Mellon Unversty Pttsburgh, PA 53 Slde Overve lassfcaton Support vector machne Regularzaton Slde lassfcaton Predct categorcal

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Answers Problem Set 2 Chem 314A Williamsen Spring 2000 Answers Problem Set Chem 314A Wllamsen Sprng 000 1) Gve me the followng crtcal values from the statstcal tables. a) z-statstc,-sded test, 99.7% confdence lmt ±3 b) t-statstc (Case I), 1-sded test, 95%

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

Laboratory 3: Method of Least Squares

Laboratory 3: Method of Least Squares Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

AS-Level Maths: Statistics 1 for Edexcel

AS-Level Maths: Statistics 1 for Edexcel 1 of 6 AS-Level Maths: Statstcs 1 for Edecel S1. Calculatng means and standard devatons Ths con ndcates the slde contans actvtes created n Flash. These actvtes are not edtable. For more detaled nstructons,

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regulariza5on, Sparsity & Lasso

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regulariza5on, Sparsity & Lasso Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularza5on, Sparsty Lasso 1 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Func5on,b L( y, f

More information

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them? Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs246.stanford.edu 2/19/18 Jure Leskovec, Stanford CS246: Mnng Massve Datasets, http://cs246.stanford.edu 2 Hgh dm. data Graph data Infnte

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information