Learning from Data 1 Naive Bayes
|
|
- Domenic Greer
- 5 years ago
- Views:
Transcription
1 Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : dbarber/lfd1/lfd1.html c Davd Barber 2001,
2 Learnng from Data 1 : c Davd Barber 2001, Why Nave Bayes? Nave Bayes s one of the smplest densty estmaton methods from whch we can form one of the standard classfcaton methods n machne learnng. Its fame s partly due to the followng propertes: Very easy to program and ntutve Fast to tran and to use as a classfer Very easy to deal wth mssng attrbutes 2 Understandng Condtonal Independence Very popular n certan felds such as computatonal lngustcs/nlp However, despte the smplcty of Nave Bayes, there are some ptfalls that need to be avoded, as we wll descrbe. The ptfalls usually made are due to a poor understandng of the central assumpton behnd Nave Bayes, namely condtonal ndependence. Before we explan how to use condtonal ndependence to form a classfer, we concentrate on explanng the basc assumpton of condtonal ndependence. Consder a general probablty dstrbuton of two varables, p(x 1, x 2 ). Usng Bayes rule, wthout loss of generalty, we can wrte p(x 1, x 2 ) = p(x 1 x 2 )p(x 2 ) (2.1) Smlarly, f we had another class varable, c, we can wrte, usng Bayes rule : p(x 1, x 2 c) = p(x 1 x 2, c)p(x 2 c) (2.2) In the above expresson, we have not made any assumptons at all. Consder now the term p(x 1 x 2, c). If knowledge of c s suffcent to determne how x 1 wll be dstrbuted, we don t need to know the state of x 2. That s, we may wrte p(x 1 x 2, c) = p(x 1 c). For example, we may wrte the general statement: p(cloudy, wndy storm) = p(cloudy wndy, storm)p(wndy storm) (2.3) where, for example, each of the varables can take the values yes or no, and now further make the assumpton p(cloudy wndy, storm) = p(cloudy storm) so that the dstrbuton becomes p(cloudy, wndy storm) = p(cloudy storm)p(wndy storm) (2.4) We can generalse the stuaton of two varables to a condtonal ndependence assumpton for a set of varables x 1,..., x N, condtonal on another varable c: N p(x c) = p(x c) (2.5) =1 A further example may help to clarfy the assumptons behnd condtonal ndependence. EasySell.com consders that ts customers convenently fall nto two groups the young or old. Based on only ths nformaton, they buld general customer profles for product preferences. Easysell.com assumes that, gven the knowledge that a customer s ether young or old, ths s suffcent to determne whether or not a customer wll lke a product, ndependent of ther lkes or dslkes for any other products. Thus, gven that a customer s young, she has a 95% chance to lke Rado1, a 5% chance to lke Rado2, a 2% chance to lke Rado3 and a 20% chance to lke Rado4. Smlarly, they model that an old customer has a 3% chance to lke Rado1, an 82% chance to lke Rado2, a 34% chance to lke Rado3 and a 92% chance to lke Rado4. Mathematcally, we would wrte p(r1, R2, R3, R4 age) = p(r1 age)p(r2 age)p(r3 age)p(r4 age) (2.6)
3 Learnng from Data 1 : c Davd Barber 2001, Are they Scottsh? where each of the varables R1, R2, R3, R4 can take the values ether lke or dslke, and the age varable can take the value ether young or old. Thus the nformaton about the age of the customer s so powerful that ths determnes the ndvdual product preferences wthout needng to know anythng else. Clearly, ths s a rather strong assumpton, but a popular one, and sometmes leads to surprsngly good results. In ths chapter, we wll take the condtonng varable to represent the class of the datapont x. Coupled then wth a sutable choce for the condtonal dstrbuton p(x c), we can then use Bayes rule to form a classfer. In ths chapter, we wll consder two cases of dfferent condtonal dstrbutons, one approprate for dscrete data and the other for contnuous data. Furthermore, we wll demonstrate how to learn any free parameters of these models. Consder the followng vector of attrbutes: (lkes shortbread, lkes lager, drnks whskey, eats porrdge, watched England play football) T (3.1) A vector x = (1, 0, 1, 1, 0) T would descrbe that a person lkes shortbread, does not lke lager, drnks whskey, eats porrdge, and has not watched England play football. Together wth each vector x µ, there s a class label descrbng the natonalty of the person: Scottsh, or Englsh. We wsh to classfy a new vector x = (1, 0, 1, 1, 0) T as ether Scottsh or Englsh. We can use Bayes rule to calculate the probablty that x s Scottsh or Englsh: p(s x) = p(x S)p(S) p(x) p(e x) = p(x E)p(E) p(x) (3.2) (3.3) Snce we must have p(s x) + p(e x) = 1, we could also wrte p(s x) = p(x S)p(S) p(x S)p(S) + p(x E)p(E) (3.4) It s straghtforward to show that the pror class probablty p(s) s smply gven by the fracton of people n the database that are Scottsh, and smlarly p(e) s gven as the fracton of people n the database that are Englsh. What about p(x S)? Ths s where our densty model for x comes n. In the prevous chapter, we looked at a usng a Gaussan dstrbuton. Here we wll make a dfferent, very strong condtonal ndependence assumpton: p(x S) = p(x 1 S)p(x 2 S)... p(x 5 S) (3.5) What ths assumpton means s that knowng whether or not someone s Scottsh, we don t need to know anythng else to calculate the probablty of ther lkes and dslkes. Matlab code to mplement Nave Bayes on a small dataset s wrtten below, where each row of the datasets represents a (row) vector of attrbutes of the form equaton (3.1).
4 Learnng from Data 1 : c Davd Barber 2001, % Nave Bayes usng Bernoull Dstrbuton xe=[ ; % englsh ; ; ; ]; xs=[ ; % scottsh ; ; ; ]; pe = sze(xe,2)/(sze(xe,2) + sze(xs,2)); ps =1-pE; % ML class prors pe = p(c=e), ps=p(c=s) me = mean(xe ) ; % ML estmates of p(x=1 c=e) ms = mean(xs ) ; % ML estmates of p(x=1 c=s) x=[ ] ; % test pont npe = pe*prod(me.^x.*(1-me).^(1-x)); % p(x,c=e) nps = ps*prod(ms.^x.*(1-ms).^(1-x)); % p(x,c=s) pxe = npe/(npe+nps) % probablty that x s englsh 3.1 Further Issues Based on the tranng data n the code above, we have the followng : p(x 1 = 1 E) = 1/2,p(x 2 = 1 E) = 1/2,p(x 3 = 1 E) = 1/3,p(x 4 = 1 E) = 1/2,p(x 5 = 1 E) = 1/2, p(x 1 = 1 S) = 1,p(x 2 = 1 S) = 4/7,p(x 3 = 1 S) = 3/7,p(x 4 = 1 S) = 5/7,p(x 5 = 1 S) = 3/7 and the pror probabltes are p(s) = 7/13 and p(e) = 6/13. For x = (1, 0, 1, 1, 0) T, we get p(s x*) = (3.6) whch s Snce ths s greater than 0.5, we would classfy ths person as beng Scottsh. Consder tryng to classfy the vector x = (0, 1, 1, 1, 1) T. In the tranng data, all Scottsh people say they lke shortbread. Ths means that p(x, S) = 0, and hence that p(s x) = 0. Ths demonstrates a dffculty wth sparse data very extreme class probabltes can be made. One way to amelorate ths stuaton s to smooth the probabltes n some way, for example by addng a certan small number M to the frequency counts of each class: p(x = 1 c) = number of tmes x = 1 for class c + M number of tmes x = 1 for class c + M + number of tmes x = 0 for class c + M (3.7) 3.2 Gaussans Ths ensures that there are no zero probabltes n the model. Fttng contnuous data s also straghtforward usng Nave Bayes. For example, f we were to model each attrbutes dstrbuton as a Gaussan, p(x c) = N(µ, σ ), ths would be exactly equvalent to usng the condtonal Gaussan densty estmator n the prevous chapter by replacng the covarance matrx wth all elements zero except for those on the dagonal.
5 Learnng from Data 1 : c Davd Barber 2001, Text Classfcaton Bag of words Nave Bayes has been often appled to classfy documents n classes. We wll outlne here how ths s done. Refer to a computatonal lngustcs course for detals of how exactly to do ths. Consder a set of documents about poltcs, and a set about sport. We search through all documents to fnd the, say 100 most commonly occurng words. Each document s then represented by a 100 dmensonal vector representng the number of tmes that each of the words occurs n that document the so called bag of words representaton (ths s clearly a very crude assumpton snce t does not take nto account the order of the words). We then ft a Nave Bayes model by fttng a dstrbuton of the number of occurrences of each word for all the documents of, frst sport, and then poltcs. Ths then completes the model. The reason Nave Bayes may be able to classfy documents reasonably well n ths way s that the condtonal ndependence assumpton s not so slly : f we know people are talkng about poltcs, ths perhaps s almost suffcent nformaton to specfy what knds of other words they wll be usng we don t need to know anythng else. (Of course, f you want ultmately a more powerful text classfer, you need to relax ths assumpton). 4 Ptfalls wth Nave Bayes 1-of-M encodng So far we have descrbed how to mplement Nave Bayes for the case of bnary attrbutes and also for the case of Gaussan contnuous attrbutes. However, very often, the software that people seem to commonly use requres that the data s n the form of bnary attrbutes. It s n the transformaton of non-bnary data to a bnary form that a common mstake occurs. Consder the followng attrbute : age. In a survey, a person s age s marked down usng the varable a 1, 2, 3. a = 1 means the person s between 0 and 10 years old, a = 2 means the person s between 10 and 20 years old, a = 3 means the person s older than 20. Perhaps there would be other attrbutes for the data, so that each data entry s a vector of two varables (a, b) T. One way to transform the varable a nto a bnary representaton would be to use three bnary varables (a 1, a 2, a 3 ). Thus, (1, 0, 0) represents a = 1, (0, 1, 0) represents a = 2 and (0, 0, 1) represents a = 3. Ths s called 1 of M codng snce only 1 of the bnary varables s actve n encodng the M states. The problem here s that ths encodng, by constructon, means that the varables a 1, a 2, a 3 are dependent for example, f we know that a 1 = 1, we know that a 2 = 0 and a 3 = 0. Regardless of any possble condtonng, these varables wll always reman completely dependent, contrary to the assumpton of Nave Bayes. Ths mstake, however, s wdespread please help preserve a lttle of my santy by not makng the same error. The correct approach s to smply use varables wth many states the multnomal rather than bnomal dstrbuton. Ths s straghtforward and left as an exercse for the nterested reader. 5 Estmaton usng Maxmum Lkelhood : Bernoull Process In ths secton we formally derve how to learn the parameters n a Nave Bayes model from data. The results are ntutve, and ndeed, we have already made use of them n the prevous sectons. However, t s nstructve to carry out ths procedure and some lght can be cast also on the nature of the decson boundary (at least for the case of bnary attrbutes). Consder a dataset X = {x µ, µ = 1,..., P } of bnary attrbutes. That s x µ {0, 1}. Each datapont x µ has an assocated class label c µ. Based upon the class label, we can splt the nputs nto those that belong to each class : X c = {x x s n class c}. We wll consder here only the case of
6 Learnng from Data 1 : c Davd Barber 2001, two classes (ths s called a Bernoull process the case of more classes s also straghtforward and called the multnomal process). Let the number of dataponts from class c = 0 be n 0 and the number from class c = 1 be n 1. For each class of the two classes, we then need to estmate the values p(x = 1 c) θ c. (The other probablty, p(x = 0 c) s smply gven from the normalsaton requrement, p(x = 0 c) = 1 p(x = 1 c) = 1 θ c). Usng the standard assumpton that the data s generated dentcally and ndependently, the lkelhood of the model generatng the dataset X c (the data X belongng to class c) s p(x c ) = p(x µ c) (5.1) µ from class c Usng our condtonal ndependence assumpton p(x c) = p(x c) = (θ c ) x (1 θ c ) 1 x (5.2) (remember that n each term n the above expresson, x s ether 0 or 1 and hence, for each term n the product, only one of the two factors wll contrbute, contrbutng a factor θ c f x = 1 and 1 θ c f x = 0). Puttng ths all together, we can fnd the log lkelhood L(θ c ) =,µ x µ log θc + (1 x µ ) log(1 θc ) (5.3) Optmsng wth respect to θ c and equate to zero) gves θ c p(x = 1 c) (dfferentate wth respect to p(x = 1 c) = number of tmes x = 1 for class c (number of tmes x = 1 for class c) + (number of tmes x = 0 for class c) (5.4) A smlar Maxmum Lkelhood argument gves the ntutve result: p(c) = number of tmes class c occurs total number of data ponts (5.5) 5.1 Classfcaton Boundary If we just wsh to fnd the most lkely class for a new pont x, we can compare the log probabltes, classfyng x as class 1 f log p(c = 1 x ) > log p(c = 0 x ) (5.6) Usng the defnton of the classfer, ths s equvalent to (snce the normalsaton constant log p(x ) can be dropped from both sdes) log p(x c = 1) + log p(c = 1) > log p(x c = 0) + log p(c = 0) (5.7) Usng the bnary encodng x {0, 1}, ths s : classfy x as class 1 f { x log θ 1 + (1 x ) log(1 θ 1 ) } + log p(c = 1) > { x log θ 0 + (1 x ) log(1 θ 0 ) } + log p(c = 0) (5.8) Note that ths decson rule can be expressed n the form : classfy x as class 1 f w x +a > 0 for some sutable choce of weghts w and constant a (the reader s nvted to fnd the explct values of these weghts). The nterpretaton of ths s that w specfes a hyperplane n the x space and x s classfed as a 1 f t les on one sde of the hyperplane. We shall talk about other such lnear classfers n a later chapter.
Learning from Data 1 Naive Bayes
Learning from Data 1 Naive Bayes Copyright David Barber 2001-2004. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 1 2 1 Why Naive Bayes? Naive Bayes is
More information9.2 Maximum A Posteriori and Maximum Likelihood
Maxmum A Posteror and Maxmum Lkelhood In the above, p( 0 < 0.5 V) = = Z 0.5 0 p( 0 V)d 0 (9.1.29) 1 B( + N H, + N T ) Z 0.5 0 +N H 1 (1 ) +N T 1 d (9.1.30) I 0.5 ( + N H, + N T ) (9.1.31) where I x (a,
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationIntroduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:
CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationProbability Theory (revisited)
Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationAnalysis of Discrete Time Queues (Section 4.6)
Analyss of Dscrete Tme Queues (Secton 4.6) Copyrght 2002, Sanjay K. Bose Tme axs dvded nto slots slot slot boundares Arrvals can only occur at slot boundares Servce to a job can only start at a slot boundary
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More informationRelevance Vector Machines Explained
October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More information2.3 Nilpotent endomorphisms
s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationModule 2. Random Processes. Version 2 ECE IIT, Kharagpur
Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be
More information} Often, when learning, we deal with uncertainty:
Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally
More informationPh 219a/CS 219a. Exercises Due: Wednesday 23 October 2013
1 Ph 219a/CS 219a Exercses Due: Wednesday 23 October 2013 1.1 How far apart are two quantum states? Consder two quantum states descrbed by densty operators ρ and ρ n an N-dmensonal Hlbert space, and consder
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationOutline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil
Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate
More informationLecture 3: Shannon s Theorem
CSE 533: Error-Correctng Codes (Autumn 006 Lecture 3: Shannon s Theorem October 9, 006 Lecturer: Venkatesan Guruswam Scrbe: Wdad Machmouch 1 Communcaton Model The communcaton model we are usng conssts
More informationarxiv: v2 [stat.me] 26 Jun 2012
The Two-Way Lkelhood Rato (G Test and Comparson to Two-Way χ Test Jesse Hoey June 7, 01 arxv:106.4881v [stat.me] 6 Jun 01 1 One-Way Lkelhood Rato or χ test Suppose we have a set of data x and two hypotheses
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLecture 10: May 6, 2013
TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationSee Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)
Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationC/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1
C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More informationLogistic Classifier CISC 5800 Professor Daniel Leeds
lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationWhy Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)
Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch
More informationStatistical Foundations of Pattern Recognition
Statstcal Foundatons of Pattern Recognton Learnng Objectves Bayes Theorem Decson-mang Confdence factors Dscrmnants The connecton to neural nets Statstcal Foundatons of Pattern Recognton NDE measurement
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationAffine and Riemannian Connections
Affne and Remannan Connectons Semnar Remannan Geometry Summer Term 2015 Prof Dr Anna Wenhard and Dr Gye-Seon Lee Jakob Ullmann Notaton: X(M) space of smooth vector felds on M D(M) space of smooth functons
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationsince [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation
Econ 388 R. Butler 204 revsons Lecture 4 Dummy Dependent Varables I. Lnear Probablty Model: the Regresson model wth a dummy varables as the dependent varable assumpton, mplcaton regular multple regresson
More information1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands
Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationChapter 8 Indicator Variables
Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More information12. The Hamilton-Jacobi Equation Michael Fowler
1. The Hamlton-Jacob Equaton Mchael Fowler Back to Confguraton Space We ve establshed that the acton, regarded as a functon of ts coordnate endponts and tme, satsfes ( ) ( ) S q, t / t+ H qpt,, = 0, and
More informationOpen Systems: Chemical Potential and Partial Molar Quantities Chemical Potential
Open Systems: Chemcal Potental and Partal Molar Quanttes Chemcal Potental For closed systems, we have derved the followng relatonshps: du = TdS pdv dh = TdS + Vdp da = SdT pdv dg = VdP SdT For open systems,
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More information