Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Size: px

Start display at page:

Download "Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134"

Leonard Mathews
6 years ago
Views:

1 Lecture 3 Naïve Baes, Mamum Etro ad Tet Classfcato COSI 34

2 Codtoal Parameterzato Two RVs: ItellgeceI ad SATS ValI = {Hgh,Low}, ValS={Hgh,Low} A ossble jot dstrbuto Ca descrbe usg cha rule as PI,S PIPS I I S PI,S Low Low Low Hgh Hgh Low 0.06 Hgh Hgh 0.24 PI=Low PI=Hgh PS I S=Low S=Hgh I=Low I=Hgh Itel SAT

3 Codtoal Ideedece Assume aother RV, GradeG Grade some course ValG={Hgh, Medum, Low} Itel SAT Grade Mght assume that G s codtoall deedet of S gve I PG I,S PG I The: PI, S, G PS, G IPI B cod. de. PS, G I PS IPG I So, PI, S, G PS IPG IPI Aother CPT for PG I More comact tha full jot Possble to udate jot wth ew formato PG I G=Hgh G=Med G=Low I=Low I=Hgh

4 Statstcal Modelg Four Questos What s the form of the model? What radom varables? How are robabltes comuted? What dstrbutos? What arameters? 2 Gve a set of data tems from the samle sace, how s the lelhood of that data comuted, for the gve model structure ad arameter values? 3 Gve a lelhood fucto, how are the otmal arameters estmated gve a set of data? 4 Gve a model form ad a set of duced arameter values, how s ferece erformed the model to mae redctos/as queres

5 Radom Varable Dstrbutos Beroull Dstrbuto Outcome s success or falure 0 Success wth robablt Probablt mass fucto PX P X 0 Categorcal Dstrbuto Outcome s oe of a fte umber of categores Probablt mass fucto P X Bomal Dstrbuto s a seres of Beroull trals Multomal Dstrbuto s a seres of Categorcal trals

6 Naïve Baes Ver smle, but effectve robablstc classfer,...,,,...,,...,,...,,..., But how do we calculate Naïve Baes Assumto:,...,,..., Each observed varable s assumed to be deedet of each other gve the class

7 Naïve Baes Iferece Frst, ote that to use the model most settgs, we do ot eed to elctl comute,...,,..., Deomator ca be gored sce the data are gve ad the same across all We are terested arg ma arg ma,...,,..., arg ma,...,,...,

8 Eamle: Documet Classfcato DOCUMENTS: To face etra sedg o Labour s olces, such as educato, Mr. Brow aouced that the Treasur would collect 30 bllo ouds b sellg atoal assets le the Tote as well as govermet shares Brtsh Eerg ad the.. FINANCE Eglad have wo the thrd Test at Mumba b 22 rus ad secured a share of the seres whch few obsesrvers, f a, gave them hoe of avodg defeat. Set 33 to w, Ida folded to 00 all out a hour ad ahalf to the afteroo sesso, wth ther SPORTS Classf documets based o ther vocabular. class C wbrow, w face, wsedg, wtreasur,...

9 Observed Varables NB The X varables,..., Beroull model troduces a set of Beroull RVs, oe for each tem our vocabular, such that X w ff w aears the documet The multomal model troduces a RV for each osto a documet. The RV s multomal, ragg over the vocabular E.g. X Eglad, X 2 have, X 3 But, we d le ostoal deedece wo X Eglad C X Eglad C j

10 Geeratve Stor Beroull Case Geerate a documet class from 2 Geerate a dcator varable X for each vocabular tem 3 Geerate words accordg to whch X = Multomal Case Geerate a documet class from 2 For each osto, geerate a word from 3 Do ths for all ostos documet X w C Note that true geeratve model would requre modelg documet legth

11 Mamum lelhood estmato We eed to fd estmates for Ad for class codtoal osterors That MAXIMIZE the lelhood Estmato, D D : log log, log, log log

12 Estmato Cot. c, c', # documets of docs of class that occurs # of tmes occurs across documets of class Beroull ML estmate Multomal ML estmate Class ror ML estmate ' c, c c', c' j c c ', j

13 Estmates ca be roblematc wth small amouts of data Other estmates ca be more relable Lalace smoothg Geeralzed Lalace smoothg Where Smoothg 2, c c j j s c v c v,, Val s

14 Documet Classfcato wth NB class C wbrow, w face, wsedg, wtreasur,... Is roortoal to: w, w, w, w class C class C Brow face sedg Treasur w w Brow Brow, w face class, w sedg C w face, w Treasur class,... class C w sedg C class Note that the model assumes each word a documet s deedet, gve the class of the documet. C... Class ror robablt s just the frequec of the class the trag data. Clearl, ths assumto s wrog. However, the classfer stll erforms well ractce.

15 Prevew of Grahcal Models Naïve Baes s a smle model Strog codtoal deedece assumtos Class Observatos Grahcal models allow us to determe/secf codtoal deedece assumtos Facltate develomet of algorthms for learg ad ferece

16 Motvato for Codtoal Model Strog deedece assumtos NB Results oorl calbrated osteror robabltes Also, NB s geeratve,,..., It models the jot dstrbuto,...,,..., It ca geerate the observed data e.g. gve a class AND mae redctos about the class gve the data We usuall ol care about mag redctos Modelg ower s used to roerl geerate the data

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier arametrc Dest Estmato: Baesa Estmato. Naïve Baes Classfer Baesa arameter Estmato Suose we have some dea of the rage where arameters should be Should t we formalze such ror owledge hoes that t wll lead