Maxent Models and Discrimina1ve Es1ma1on
|
|
- Vivian Griffith
- 6 years ago
- Views:
Transcription
1 Maxent Models and Dscrmna1ve Es1ma1on Genera&ve vs. Dscrmna&ve models Chrstopher Mannng
2 Chrstopher Mannng Introduc1on So far we ve looked at genera&ve models Language models Nave Bayes But there s now much use of cond&onal or dscrmna&ve probabls&c models n NLP Speech IR and ML generally Because: They gve hgh accuracy performance They make t easy to ncorporate lots of lngus&cally mportant features They allow automa&c buldng of language ndependent retargetable NLP modules
3 Chrstopher Mannng Jont vs. Cond1onal Models We have some data {d c} of pared observa&ons d and hdden classes c. Jont genera&ve models place probabl&es over both observed data and the hdden stuff gene- rate the observed data from hdden stuff: All the classc StatNLP models: n- gram models Nave Bayes classfers hdden Markov models probabls&c context- free grammars IBM machne transla&on algnment models Pcd
4 Chrstopher Mannng Jont vs. Cond1onal Models Dscrmna&ve cond&onal models take the data as gven and put a probablty over hdden structure gven the data: Logs&c regresson cond&onal loglnear or maxmum entropy models cond&onal random felds Also SVMs averaged perceptron etc. are dscrmna&ve classfers but not drectly probabls&c Pc d
5 Chrstopher Mannng Bayes Net/Graphcal Models Bayes net dagrams draw crcles for random varables and lnes for drect dependences Some varables are observed; some are hdden Each node s a lyle classfer cond&onal probablty table based on ncomng arcs c c d 1 d 2 d 3 Nave Bayes Genera&ve d 1 d 2 d 3 Logs&c Regresson Dscrmna&ve
6 Chrstopher Mannng Cond1onal vs. Jont Lkelhood A jont model gves probabl&es Pdc and tres to maxmze ths jont lkelhood. It turns out to be trval to choose weghts: just rela&ve frequences. A condonal model gves probabl&es Pc d. It takes the data as gven and models only the cond&onal probablty of the class. We seek to maxmze cond&onal lkelhood. Harder to do as we ll see More closely related to classfca&on error.
7 Chrstopher Mannng Cond1onal models work well: Word Sense Dsambgua1on Tranng Set Objectve Accuracy Jont Lke Cond. Lke Even wth exactly the same features changng from jont to cond&onal es&ma&on ncreases performance Objectve Test Set Accuracy Jont Lke Cond. Lke Klen and Mannng 2002 usng Senseval-1 Data That s we use the same smoothng and the same word- class features we just change the numbers parameters
8 Maxent Models and Dscrmna1ve Es1ma1on Genera&ve vs. Dscrmna&ve models Chrstopher Mannng
9 Dscrmna1ve Model Features Makng features from text for dscrmna&ve NLP models Chrstopher Mannng
10 Chrstopher Mannng Features In these sldes and most maxent work: features f are elementary peces of evdence that lnk aspects of what we observe d wth a category c that we want to predct A feature s a func&on wth a bounded real value: f: C D R
11 Chrstopher Mannng Features In these sldes and most maxent work: features f are elementary peces of evdence that lnk aspects of what we observe d wth a category c that we want to predct A feature s a func&on wth a bounded real value
12 Chrstopher Mannng Example features f 1 c d [c = LOCATION w -1 = n scaptalzedw] f 2 c d [c = LOCATION hasaccentedlatncharw] f 3 c d [c = DRUG endsw c ] LOCATION n Arcada LOCATION n Québec DRUG takng Zantac PERSON saw Sue Models wll assgn to each feature a weght: A pos&ve weght votes that ths confgura&on s lkely correct A nega&ve weght votes that ths confgura&on s lkely ncorrect
13 Chrstopher Mannng Example features f 1 c d [c = LOCATION w -1 = n scaptalzedw] f 2 c d [c = LOCATION hasaccentedlatncharw] f 3 c d [c = DRUG endsw c ] LOCATION n Arcada LOCATION n Québec DRUG takng Zantac PERSON saw Sue Models wll assgn to each feature a weght: A pos&ve weght votes that ths confgura&on s lkely correct A nega&ve weght votes that ths confgura&on s lkely ncorrect
14 Chrstopher Mannng Feature Expecta1ons We wll crucally make use of two expectaons actual or predcted counts of a feature frng: Emprcal count expecta&on of a feature: Model expecta&on of a feature: =! " observed emprcal D C d c d c f f E =! " D C d c d c f d c P f E
15 Chrstopher Mannng Features In NLP uses usually a feature specfes 1 an ndcator func&on a yes/no boolean matchng func&on of proper&es of the nput and 2 a par&cular class f c d [Φd c = c j ] [Value s 0 or 1] They pck out a data subset and suggest a label for t. We wll say that Φd s a feature of the data d when for each c j the conjunc&on Φd c = c j s a feature of the data- class par c d
16 Chrstopher Mannng Features In NLP uses usually a feature specfes 1. an ndcator func&on a yes/no boolean matchng func&on of proper&es of the nput and 2. a par&cular class f c d [Φd c = c j ] [Value s 0 or 1] Each feature pcks out a data subset and suggests a label for t
17 Chrstopher Mannng Feature- Based Models The decson about a data pont s based only on the features ac&ve at that pont. Data BUSINESS: Stocks ht a yearly low Data to restructure bank:money debt. Data DT JJ NN The prevous fall Label: BUSINESS Features { stocks ht a yearly low } Text Categorzaton Label: MONEY Features { w -1 =restructure w +1 =debt L=12 } Word-Sense Dsambguaton Label: NN Features {w=fall t -1 =JJ w -1 =prevous} POS Taggng
18 Chrstopher Mannng Example: Text Categorza1on Zhang and Oles 2001 Features are presence of each word n a document and the document class they do feature selec&on to use relable ndcator words Tests on classc Reuters data set and others Naïve Bayes: 77.0% F 1 Lnear regresson: 86.0% Logs&c regresson: 86.4% Support vector machne: 86.5% Paper emphaszes the mportance of regularzaon smoothng for successful use of dscrmna&ve methods not used n much early NLP/IR work
19 Chrstopher Mannng Other Maxent Classfer Examples You can use a maxent classfer whenever you want to assgn data ponts to one of a number of classes: Sentence boundary detec&on Mkheev 2000 Is a perod end of sentence or abbreva&on? Sen&ment analyss Pang and Lee 2002 Word ungrams bgrams POS counts PP ayachment Ratnaparkh 1998 AYach to verb or noun? Features of head noun prepos&on etc. Parsng decsons n general Ratnaparkh 1997; Johnson et al etc.
20 Dscrmna1ve Model Features Makng features from text for dscrmna&ve NLP models Chrstopher Mannng
21 Feature- based Lnear Classfers How to put features nto a classfer 21
22 Chrstopher Mannng Feature- Based Lnear Classfers Lnear classfers at classfca&on &me: Lnear functon from feature sets {f } to classes {c}. Assgn a weght λ to each feature f. We consder each class for an observed datum d For a par cd features vote wth ther weghts: votec = Σλ f cd PERSON n Québec LOCATION n Québec DRUG n Québec Choose the class c whch maxmzes Σλ f cd
23 Chrstopher Mannng Feature- Based Lnear Classfers Lnear classfers at classfca&on &me: Lnear functon from feature sets {f } to classes {c}. Assgn a weght λ to each feature f. We consder each class for an observed datum d For a par cd features vote wth ther weghts: votec = Σλ f cd PERSON n Québec LOCATION n Québec 0.3 DRUG n Québec Choose the class c whch maxmzes Σλ f cd = LOCATION
24 Chrstopher Mannng Feature- Based Lnear Classfers There are many ways to chose weghts for features Perceptron: fnd a currently msclassfed example and nudge weghts n the drec&on of ts correct classfca&on Margn- based methods Support Vector Machnes
25 Chrstopher Mannng Feature- Based Lnear Classfers Exponen&al log- lnear maxent logs&c Gbbs models: Make a probabls&c model from the lnear combna&on Σλ f cd P c d! =! exp! c'! exp " f c d Makes votes postve " f c' d PLOCATION n Québec = e 1.8 e 0.6 /e 1.8 e e e 0 = PDRUG n Québec = e 0.3 /e 1.8 e e e 0 = PPERSON n Québec = e 0 /e 1.8 e e e 0 = The weghts are the parameters of the probablty model combned va a soft max functon Normalzes votes
26 Chrstopher Mannng Feature- Based Lnear Classfers Exponen&al log- lnear maxent logs&c Gbbs models: Gven ths model form we wll choose parameters {λ } that maxmze the condonal lkelhood of the data accordng to ths model. We construct not only classfca&ons but probablty dstrbu&ons over classfca&ons. There are other good! ways of dscrmna&ng classes SVMs boos&ng even perceptrons but these methods are not as trval to nterpret as dstrbu&ons over classes.
27 Chrstopher Mannng Asde: logs1c regresson Maxent models n NLP are essen&ally the same as mul&class logs&c regresson models n sta&s&cs or machne learnng 27 If you haven t seen these before don t worry ths presenta&on s self- contaned! If you have seen these before you mght thnk about: The parameterza&on s slghtly dfferent n a way that s advantageous for NLP- style models wth tons of sparse features but sta&s&cally nelegant The key role of feature func&ons n NLP and n ths presenta&on The features are more general wth f also beng a func&on of the class when mght ths be useful?
28 Chrstopher Mannng Quz Queston Assumng exactly the same set up 3 class decson: LOCATION PERSON or DRUG; 3 features as before maxent what are: PPERSON by Goérc = PLOCATION by Goérc = PDRUG by Goérc = 1.8 f 1 c d [c = LOCATION w -1 = n scaptalzedw] -0.6 f 2 c d [c = LOCATION hasaccentedlatncharw] 0.3 f 3 c d [c = DRUG endsw c ] PERSON by Goérc LOCATION by Goérc DRUG by Goérc P c d! =! exp! c'! exp " f c d " f c' d
29 Feature- based Lnear Classfers How to put features nto a classfer 29
30 Buldng a Maxent Model The nuts and bolts
31 Chrstopher Mannng Buldng a Maxent Model We defne features ndcator func&ons over data ponts Features represent sets of data ponts whch are ds&nc&ve enough to deserve model parameters. Words but also word contans number word ends wth ng etc. We wll smply encode each Φ feature as a unque Strng A datum wll gve rse to a set of Strngs: the ac&ve Φ features Each feature f c d [Φd c = c j ] gets a real number weght We concentrate on Φ features but the math uses ndces of f
32 Chrstopher Mannng Buldng a Maxent Model Features are oven added durng model development to target errors Oven the easest thng to thnk of are features that mark bad combna&ons Then for any gven feature weghts we want to be able to calculate: Data cond&onal lkelhood Derva&ve of the lkelhood wrt each feature weght Uses expecta&ons of each feature accordng to the model We can then fnd the op&mum feature weghts dscussed later.
33 Buldng a Maxent Model The nuts and bolts
34 Nave Bayes vs. Maxent models Genera&ve vs. Dscrmna&ve models: The problem of overcoun&ng evdence Chrstopher Mannng
35 Chrstopher Mannng Europe Text classfca1on: Asa or Europe Tranng Data Asa Monaco Monaco Monaco Monaco Monaco Monaco Hong Kong Hong Kong Monaco Monaco Hong Kong Hong Kong NB Model Class X 1 =M NB FACTORS: PA = PE = PM A = PM E = PREDICTIONS: PAM = PEM = PA M = PE M =
36 Chrstopher Mannng Europe Text classfca1on: Asa or Europe Tranng Data Asa Monaco Monaco Monaco Monaco Monaco Monaco Hong Kong Hong Kong Monaco Monaco Hong Kong Hong Kong NB Model Class X 1 =H X 2 =K NB FACTORS: PA = PE = PH A = PK A = PH E = PK E = PREDICTIONS: PAHK = PEHK = PA HK = PE HK =
37 Chrstopher Mannng Europe Text classfca1on: Asa or Europe Tranng Data Asa Monaco Monaco Monaco Monaco Monaco Monaco Hong Kong Hong Kong Monaco Monaco Hong Kong Hong Kong NB Model Class H K M NB FACTORS: PA = PE = PM A = PM E = PH A = PK A = PH E = PK E = PREDICTIONS: PAHKM = PEHKM = PA HKM = PE HKM =
38 Chrstopher Mannng Nave Bayes vs. Maxent Models Nave Bayes models mul&- count correlated evdence Each feature s mul&pled n even when you have mul&ple features tellng you the same thng Maxmum Entropy models preyy much solve ths problem As we wll see ths s done by wegh&ng features so that model expecta&ons match the observed emprcal expecta&ons
39 Nave Bayes vs. Maxent models Genera&ve vs. Dscrmna&ve models: The problem of overcoun&ng evdence Chrstopher Mannng
40 Maxent Models and Dscrmna1ve Es1ma1on Maxmzng the lkelhood
41 Chrstopher Mannng Exponen1al Model Lkelhood Maxmum Cond&onal Lkelhood Models : Gven a model form choose values of parameters to maxmze the cond&onal lkelhood of the data.!! " " = = log log log D C d c D C d c d c P D C P # #!! ' ' exp c d c f "! d c f exp "
42 Chrstopher Mannng The Lkelhood Value The log cond&onal lkelhood of d data CD accordng to maxent model s a func&on of the data and the parameters λ: If there aren t many values of c t s easy to calculate:! " # # = = log log log D C d c D C d c d c P d c P D C P $ $ $! " = log log D C d c D C P #!! ' ' exp c d c f "! d c f exp "
43 Chrstopher Mannng The Lkelhood Value We can separate ths nto two components: log P C D! =! logexp! c d " C D # f c d! The derva&ve s the dfference between the derva&ves of each component log P C D! = N!! M!! log! exp! c d " C D c' # f c' d
44 Chrstopher Mannng The Derva1ve I: Numerator Derva&ve of the numerator s: the emprcal countf c D C d c d c f!! " " = # # $!! " # # = D C d c d c f $ $! " = D C d c d c f D C d c c d c f N!!!! " " = " " # # $ log exp
45 Chrstopher Mannng The Derva1ve II: Denomnator " # log# exp# " M! c d $ C D c' = "! "! = =! 1! f c' d! exp! c' c d " C D! exp! $ c'' d # c'' 1 # f $ exp! $ f c' d $ f c' d #!! c d " C D! exp! $ f c'' d c' 1 # $ c'' exp#! f c' d "#! f c' d # # c d $ C D c' # exp! f c d "! c # '' '' = =!! c d " C D c'! $ f c' d P c' d # f c' d = predcted countf λ
46 Chrstopher Mannng! log The Derva1ve III P C D"!" = actual count f C! predcted count! The op&mum parameters are the ones for whch each feature s predcted expecta&on equals ts emprcal expecta&on. The op&mum dstrbu&on s: Always unque but parameters may not be unque Always exsts f feature counts are from actual data. These models are also called maxmum entropy models because we fnd the model havng maxmum entropy and sa&sfyng the constrants: E p f j = E~ p f j! j f
47 Chrstopher Mannng Fndng the op1mal parameters We want to choose parameters λ 1 λ 2 λ 3 that maxmze the cond&onal log- lkelhood of the tranng data CLogLk D =! log P c d n = 1 To be able to do that we ve worked out how to calculate the func&on value and ts par&al derva&ves ts gradent
48 Chrstopher Mannng A lkelhood surface
49 Chrstopher Mannng Fndng the op1mal parameters Use your favorte numercal op&mza&on package. Commonly and n our code you mnmze the nega&ve of CLogLk 1. Gradent descent GD; Stochas&c gradent descent SGD 2. Itera&ve propor&onal f ng methods: Generalzed Itera&ve Scalng GIS and Improved Itera&ve Scalng IIS 3. Conjugate gradent CG perhaps wth precond&onng 4. Quas- Newton methods lmted memory varable metrc LMVM methods n par&cular L- BFGS
50 Maxent Models and Dscrmna1ve Es1ma1on Maxmzng the lkelhood
51 Maxent Models and Dscrmnatve Estmaton The maxmum entropy model presentaton
52 Chrstopher Mannng Maxmum Entropy Models An equvalent approach: Lots of dstrbutons out there most of them very spked specfc overft. We want a dstrbuton whch s unform except n specfc ways we requre. Unformty means hgh entropy we can search for dstrbutons whch have propertes we desre but also have hgh entropy. Ignorance s preferable to error and he s less remote from the truth who beleves nothng than he who beleves what s wrong Thomas Jefferson 1781
53 Chrstopher Mannng Maxmum Entropy Entropy: the uncertanty of a dstrbuton. Quantfyng uncertanty surprse : Event Probablty Surprse log1/p x Entropy: expected surprse over p: x p x H p HEADS A con-flp s most uncertan for a far con.
54 Chrstopher Mannng Maxent Examples I What do we want from a dstrbuton? Mnmze commtment = maxmze entropy. Resemble some reference dstrbuton data. Soluton: maxmze entropy H subject to feature-based constrants: Unconstraned max at 0.5 Addng constrants features: Lowers maxmum entropy Rases maxmum lkelhood of data Brngs the dstrbuton further from unform Brngs the dstrbuton closer to data Constrant that p HEADS = 0.3
55 Chrstopher Mannng Maxent Examples II Hp H p T p H + p T = 1 p H = 0.3 1/e - x log x
56 Chrstopher Mannng Maxent Examples III Let s say we have the followng event space: NN NNS NNP NNPS VBZ VBD and the followng emprcal data: Maxmze H: /e 1/e 1/e 1/e 1/e 1/e want probabltes: E[NNNNSNNPNNPSVBZVBD] = 1 1/6 1/6 1/6 1/6 1/6 1/6
57 Chrstopher Mannng Maxent Examples IV Too unform! N* are more common than V* so we add the feature f N = {NN NNS NNP NNPS} wth E[f N ] =32/36 NN NNS NNP NNPS VBZ VBD 8/36 8/36 8/36 8/36 2/36 2/36 and proper nouns are more frequent than common nouns so we add f P = {NNP NNPS} wth E[f P ] =24/36 4/36 4/36 12/36 12/36 2/36 2/36 we could keep refnng the models e.g. by addng a feature to dstngush sngular vs. plural nouns or verb types.
58 Chrstopher Mannng Convexty Convex Non-Convex Convexty guarantees a sngle global maxmum because any hgher ponts are greedly reachable.
59 Chrstopher Mannng Convexty II Constraned Hp = x log x s convex: x log x s convex x log x s convex sum of convex functons s convex. The feasble regon of constraned H s a lnear subspace whch s convex The constraned entropy surface s therefore convex. The maxmum lkelhood exponental model dual formulaton s also convex.
60 Feature Overlap/Feature Interacton How overlappng features work n maxent models
61 Chrstopher Mannng Feature Overlap Maxent models handle overlappng features well. Unlke a NB model there s no double countng! A a A a A a Emprcal B b B b B b A a B 2 1 b 2 1 All = 1 A a B 1/4 1/4 A = 2/3 A a B 1/3 1/6 A = 2/3 A a B 1/3 1/6 b 1/4 1/4 b 1/3 1/6 b 1/3 1/6 A a A a A a B B A B A + A b b A b A + A
62 Chrstopher Mannng Example: Named Entty Feature Overlap Grace s correlated wth PERSON but does not add much evdence on top of already knowng prefx features. Local Context Prev Cur Next State Other?????? Word at Grace Road Tag IN NNP NNP Sg x Xx Xx Feature Weghts Prevous word at Current word Grace Begnnng bgram <G Current POS tag NNP Prev and cur tags IN NNP Prevous state Other Current sgnature Xx Prev state cur sg O-Xx Prev-cur-next sg x-xx-xx P. state - p-cur sg O-x-Xx
63 Chrstopher Mannng Feature Interacton Maxent models handle overlappng features well but do not automatcally model feature nteractons. A a A a A a Emprcal B b B b B b A a B 1 1 b 1 0 All = 1 A a B 1/4 1/4 A = 2/3 A a B 1/3 1/6 B = 2/3 A a B 4/9 2/9 b 1/4 1/4 b 1/3 1/6 b 2/9 1/9 A a A a A a B 0 0 B A B A + B B b 0 0 b A b A
64 Chrstopher Mannng Feature Interacton If you want nteracton terms you have to add them: A a A a A a Emprcal B b B b B b A a B 1 1 b 1 0 A = 2/3 A a B 1/3 1/6 B = 2/3 A a B 4/9 2/9 AB = 1/3 A a B 1/3 1/3 b 1/3 1/6 b 2/9 1/9 b 1/3 0 A dsjunctve feature would also have done t alone: A a A a B B 1/3 1/3 b b 1/3 0
65 Chrstopher Mannng Suppose we have a 1 feature maxent model bult over observed data as shown. What s the constructed model s probablty dstrbuton over the four possble outcomes? Emprcal A a B 2 1 b 2 1 B b Features A a Expectatons Probabltes A a B b
66 Chrstopher Mannng Feature Interacton For loglnear/logstc regresson models n statstcs t s standard to do a greedy stepwse search over the space of all possble nteracton terms. Ths combnatoral space s exponental n sze but that s okay as most statstcs models only have 4 8 features. In NLP our models commonly use hundreds of thousands of features so that s not okay. Commonly nteracton terms are added by hand based on lngustc ntutons.
67 Chrstopher Mannng Example: NER Interacton Prevous-state and current-sgnature have nteractons e.g. P=PERS-C=Xx ndcates C=PERS much more strongly than C=Xx and P=PERS ndependently. Ths feature type allows the model to capture ths nteracton. Local Context Prev Cur Next State Other?????? Word at Grace Road Tag IN NNP NNP Sg x Xx Xx Feature Weghts Prevous word at Current word Grace Begnnng bgram <G Current POS tag NNP Prev and cur tags IN NNP Prevous state Other Current sgnature Xx Prev state cur sg O-Xx Prev-cur-next sg x-xx-xx P. state - p-cur sg O-x-Xx
Maxent Models and Discriminative Estimation
Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much
More informationMaxent Models & Deep Learning
Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson
More informationDiscriminative Estimation (Maxent models and perceptron)
srmnatve Estmaton Maxent moels an pereptron Generatve vs. srmnatve moels Many sles are aapte rom sles by hrstopher Mannng Introuton So ar we ve looke at generatve moels Nave Bayes But there s now muh use
More informationDiscrimina)ve Es)ma)on (Maxent models and perceptron)
Dsrmnave Esmaon Maxent moels an pereptron Genera&ve vs. Dsrmna&ve moels Many sles are aapte from sles by Chrstopher Mannng an pereptron sles by Alan Rtter Introuon So far we ve looke at genera&ve moels
More informationFeature-Rich Sequence Models. Statistical NLP Spring MEMM Taggers. Decoding. Derivative for Maximum Entropy. Maximum Entropy II
Statstcal NLP Sprng 2010 Feature-Rch Sequence Models Problem: HMMs make t hard to work wth arbtrary features of a sentence Example: name entty recognton (NER) PER PER O O O O O O ORG O O O O O LOC LOC
More informationMaxent Models and Discriminative Estimation. Generative vs. Discriminative models
+ Maxent Moels an Dsrmnatve Estmaton Generatve vs. Dsrmnatve moels + Introuton n So far we ve looke at generatve moels n Language moels Nave Bayes 2 n But there s now muh use of ontonal or srmnatve probablst
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationConditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationManning & Schuetze, FSNLP (c)1999, 2001
page 589 16.2 Maxmum Entropy Modelng 589 Mannng & Schuetze, FSNLP (c)1999, 2001 a decson tree that detects spam. Fndng the rght features s paramount for ths task, so desgn your feature set carefully. Exercse
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationMachine Learning for Language Technology Lecture 8: Decision Trees and k- Nearest Neighbors
Machne Learnng for Language Technology Lecture 8: Decson Trees and k- Nearest Neghbors Marna San:n Department of Lngus:cs and Phlology Uppsala Unversty, Uppsala, Sweden Autumn 2014 Acknowledgement: Thanks
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLog Linear Models for Text Classification
Log Lnear Models for Tet Classfcaton Mausam Sldes b Mchael Collns Eml Fo Aleander Ihler Dan Jurafsk Dan Klen Chrs Mannng Ra Moone Mark Schmdt Dan Weld Ale Yates Luke Zettlemoer Introducton So far e ve
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationSupport Vector Machines
CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationLaboratory 1c: Method of Least Squares
Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More informationCS286r Assign One. Answer Key
CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationWe present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.
CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationProbability Theory (revisited)
Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More information} Often, when learning, we deal with uncertainty:
Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 16: Genera,ve vs. Dscrmna,ve / K- nearest- neghbor Classfer / LOOCV Yanjun Q / Jane,, PhD Unversty of Vrgna Department of
More informationResource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud
Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal
More information18-660: Numerical Methods for Engineering Design and Optimization
8-66: Numercal Methods for Engneerng Desgn and Optmzaton n L Department of EE arnege Mellon Unversty Pttsburgh, PA 53 Slde Overve lassfcaton Support vector machne Regularzaton Slde lassfcaton Predct categorcal
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationNote on EM-training of IBM-model 1
Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationCorrelation and Regression. Correlation 9.1. Correlation. Chapter 9
Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationAnswers Problem Set 2 Chem 314A Williamsen Spring 2000
Answers Problem Set Chem 314A Wllamsen Sprng 000 1) Gve me the followng crtcal values from the statstcal tables. a) z-statstc,-sded test, 99.7% confdence lmt ±3 b) t-statstc (Case I), 1-sded test, 95%
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationLaboratory 3: Method of Least Squares
Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationAS-Level Maths: Statistics 1 for Edexcel
1 of 6 AS-Level Maths: Statstcs 1 for Edecel S1. Calculatng means and standard devatons Ths con ndcates the slde contans actvtes created n Flash. These actvtes are not edtable. For more detaled nstructons,
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regulariza5on, Sparsity & Lasso
Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularza5on, Sparsty Lasso 1 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Func5on,b L( y, f
More informationImage classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?
Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs246.stanford.edu 2/19/18 Jure Leskovec, Stanford CS246: Mnng Massve Datasets, http://cs246.stanford.edu 2 Hgh dm. data Graph data Infnte
More informationVapnik-Chervonenkis theory
Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More information