Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

arametrc Dest Estmato: Baesa Estmato. Naïve Baes Classfer

Baesa arameter Estmato Suose we have some dea of the rage where arameters should be Should t we formalze such ror owledge hoes that t wll lead to better arameter estmato? Let be a radom varable wth ror dstrbuto Ths s the e dfferece betwee ML ad Baesa arameter estmato Ths e assumto allows us to full elot the formato rovded b the data

Baesa arameter Estmato s a radom varable wth ror Ule MLE case, θ s a codtoal dest The trag data D allow us to covert θ to a osteror robablt dest θd. After we observe the data D, usg Baes rule we ca comute the osteror θd But s ot our fal goal, our fal goal s the uow Therefore a better thg to do s to mamze D, ths s as close as we ca come to the uow!

uow ow Baesa Estmato: Formula for D From the defto of ot dstrbuto: d D D, d D D D, Usg the defto of codtoal robablt: d D D But,D= sce s comletel secfed b Usg Baes formula, d D D D D

Baesa Estmato vs. MLE So rcle D ca be comuted I ractce, t ma be hard to do tegrato aaltcall, ma have to resort to umercal methods D d d Cotrast ths wth the MLE soluto whch reures dfferetato of lelhood to get ˆ Dfferetato s eas ad ca alwas be doe aaltcall

Baesa Estmato vs. MLE suort receves from the data D D roosed model wth certa d The above euato mles that f we are less certa about the eact value of θ, we should cosder a weghted average of θ over the ossble values of θ. Cotrast ths wth the MLE soluto whch alwas gves us a sgle model: ˆ

Baesa Estmato for Gaussa wth uow m Let m be Nm, σ that s σ s ow, but m s uow ad eeds to be estmated, so θ = m Assume a ror over m : m ~ N m 0, 0 m 0 ecodes some ror owledge about the true mea m, whle measures our ror ucertat. 0 The osteror dstrbuto s: m D D m m m m m 0 'e 0 m 0 ''e m m 0 0

Baesa Estmato for Gaussa wth uow m Where factors that do ot deed o μ have bee absorbed to the costats α ad α s a eoet of a uadratc fucto of μ.e. t s a ormal dest. remas ormal for a umber of trag samles. If we wrte m D m D e m m m D m m m 0 0 0 ' ' e

Baesa Estmato for Gaussa wth uow m the detfg the coeffcets, we get m m ˆ m 0 0 where ˆ m s the samle mea Solvg elctl for ad we obta: 0 m ˆ m m 0 0 0 m 0 our best guess after observg samles 0 0 ucertat about the guess, decreases mootocall wth

Baesa Estmato for Gaussa wth uow m Each addtoal observato decreases our ucertat about the true value of m. As creases, m D becomes more ad more sharl eaed, aroachg a Drac delta fucto as aroaches ft. Ths behavor s ow as Baesa Learg.

Baesa Estmato for Gaussa wth uow m 0 m ˆ m m 0 0 0 m I geeral, s a lear combato of a samle mea ad a ror m 0, wth coeffcets that are o-egatve ad sum to. Thus m les somewhere betwee ˆ m ad m0. If 0 0, m ˆ m as If 0 0, our a ror certat that m m0 s so strog that o umber of observatos ca chage our oo. If a ror guess s ver ucerta 0 s large, we tae m ˆ m m ˆ

Baesa Estmato for Gaussa wth uow m We stll should comute, ~ e e N d D D m m m m m m m, ~ N D m

Baesa Estmato: Eamle for U[0,] Let X be U[0,]. Recall =/ sde [0,], else 0 0 0 Suose we assume a U[0,0] ror o good ror to use f we ust ow the rage of but do t ow athg else

Baesa Estmato: Eamle for U[0,] We eed to comute D D usg d D D ad D D d Whe comutg MLE of, we had D Thus D 0 for ma{ otherwse c for ma{ 0 otherwse,...,,..., } } 0 0 3 0 D where c s the ormalzg costat,.e. c 0 ma,..., d

Baesa Estmato: Eamle for U[0,] We eed to comute D D D c for ma{ 0 otherwse We have cases:. case < ma{,,, }. case > ma{,,, },..., } 0 3 0 D c d ma{,... } 0 d D 0 0 D c d c c c 0 costat deedet of

Baesa Estmato: Eamle for U[0,] ML ˆ Baes D 3 0 Note that eve after >ma {,,, }, Baes dest s ot zero, whch maes sese curous fact: Baes dest s ot uform,.e. does ot have the fuctoal form that we have assumed!

ML vs. Baesa Estmato wth Broad ror Suose s flat ad broad close to uform ror D teds to share f there s a lot of data D D Thus D D wll have the same shar ea as D But b defto, ea of D s the ML estmate ^ The tegral s domated b the ea: ˆ D Dd ˆ Dd ˆ Thus as goes to ft, Baesa estmate wll aroach the dest corresodg to the MLE!

ML vs. Baesa Estmato Number of trag data The two methods are euvalet assumg fte umber of trag data ad ror dstrbutos that do ot eclude the true soluto. For small trag data sets, the gve dfferet results most cases. Comutatoal comlet ML uses dfferetal calculus or gradet search for mamzg the lelhood. Baesa estmato reures comle multdmesoal tegrato techues.

ML vs. Baesa Estmato Soluto comlet Easer to terret ML solutos.e., must be of the assumed arametrc form. A Baesa estmato soluto mght ot be of the arametrc form assumed. Hard to terret, returs weghted average of models. Broad or asmmetrc θ/d I ths case, the two methods wll gve dfferet solutos. Baesa methods wll elctl elot such formato.

ML vs. Baesa Estmato Geeral commets There are strog theoretcal ad methodologcal argumets suortg Baesa estmato. I ractce, ML estmato s smler ad ca lead to comarable erformace.

Naïve Baes Classfer

Ubased Learg of Baes Classfers s Imractcal Lear Baes classfer b estmatg XY ad Y. AssumeY s boolea ad X s a vector of boolea attrbutes. I ths case, we eed to estmate a set of arameters X Y taes o How ma arameters? ossble values; For a artcular value, ad the ossble values of, we eed comute - deedet arameters. Gve the two ossble values for Y, we must estmate a total of - such arameters. taes o ossble values. Comle model Hgh varace wth lmted data!!!

Codtoal Ideedece Defto: X s codtoall deedet of Y gve Z, f the robablt dstrbuto goverg X s deedet of the value of Y, gve the value of Z,, X Y, Z z X Z z Eamle: Thuder Ra, Lghtg Thuder Lghtg Note that geeral Thuder s ot deedet of Ra, but t s gve Lghtg. Euvalet to: X, Y Z X Y, Z Y Z X Z Y Z

Dervato of Nave Baes Algorthm Nave Baes algorthm assumes that the attrbutes X,,X are all codtoall deedet of oe aother, gve Y. Ths dramatcall smlfes the reresetato of XY estmatg XY from the trag data. Cosder X=X,X X Y X, X Y X Y X Y For X cotag attrbutes X Y X Y Gve the boolea X ad Y, ow we eed ol arameters to defe XY, whch s dramatc reducto comared to the - arameters f we mae o codtoal deedece assumto.

The Naïve Baes Classfer Gve: ror Y codtoall deedet features X, gve the class Y For each X, we have lelhood X Y The robablt that Y wll tae o ts th ossble value, s The Decso rule: Y X Y Y X Y X Y arg ma * Y X Y If assumto holds, NB s otmal classfer!

Naïve Baes for the dscrete uts Gve, attrbutes X each tag o J ossble dscrete values ad Y a dscrete varable tag o K ossble values. MLE for Lelhood X Y gve a set of trag eamles D: # D{ X Y } ˆ X Y # D{ Y } where the #D{} oerator returs the umber of elemets the set D that satsf roert. MLE for the ror ˆ Y # D{ Y D } umber of elemets the trag set D

NB Eamle Gve, trag data X Y Classf the followg ovel stace : Outloo=su, Tem=cool,Humdt=hgh,Wd=strog

NB Eamle arg ma }, { o es NB strog Wd hgh Humdt cool Tem su Outloo 0.36 5/4 0.64 9 /4 rors : o lates es lates... 0.6 3/ 5 0.33 9 3/ strog: Wd e.g. robabltes, Codtoal o lates strog Wd es lates strog Wd 0.0053 es strog es hgh es cool es su es 0.60 o strog o hgh o cool o su o

Subtletes of NB classfer Volatg the NB assumto Usuall, features are ot codtoall deedet. Noetheless, NB ofte erforms well, eve whe assumto s volated [Domgos& azza 96] dscuss some codtos for good erformace

Subtletes of NB classfer Isuffcet trag data What f ou ever see a trag stace where X =a whe Y=b? X =a Y=b = 0 Thus, o matter what the values X,,X tae: Soluto? Y=b X =a,x,,x = 0

Subtletes of NB classfer Isuffcet trag data To avod ths, use a smoothed estmate effectvel adds a umber of addtoal hallucated eamles assumes these hallucated eamles are sread evel over the ossble values of X. Ths smoothed estmate s gve b # D{ X Y ˆ X Y # D{ Y } lj # D{ Y } l ˆ Y D lj l determes the stregth of the smoothg If l= called Lalace smoothg } l The umber of hallucated eamles

Nave Baes for Cotuous Iuts Whe the X are cotuous we must choose some other wa to rereset the dstrbutos X Y. Oe commo aroach s to assume that for each ossble dscrete value of Y, the dstrbuto of each cotuous X s Gaussa. I order to tra such a Naïve Baes classfer we must estmate the mea ad stadard devato of each of these Gaussas

Nave Baes for Cotuous Iuts MLE for meas where refers to the th trag eamle, ad where δy= s f Y = ad 0 otherwse. Note the role of δ s to select ol those trag eamles for whch Y =. MLE for stadard devato Y X Y m ˆ Y X Y m ˆ ˆ

Learg Classf Tet Alcatos: Lear whch ews artcle are of terest Lear to classf web ages b toc. Naïve Baes s amog most effectve algorthms Target cocet Iterestg?: Documet->{+,-} Rereset each documet b vector of words oe attrbute er word osto documet Learg: Use trag eamles to estmate + - doc+ doc-

Tet Classfcato-Eamle: Tet Tet Classfcato, or the tas of automatcall assgg sematc categores to atural laguage tet, has become oe of the e methods for orgazg ole formato. Sce had-codg classfcato rules s costl or eve mractcal, most moder aroaches emlo mache learg techues to automatcall lear tet classfers from eamles. The tet cotas 48 words Tet Reresetato a = tet,a = classfcato,. a 48 = eamles The reresetato cotas 48 attrbutes Note: Tet sze ma var, but t wll ot cause a roblem

NB codtoal deedece Assumto doc legth doc a w The NB assumto s that the word robabltes for oe tet osto are deedet of the words other ostos, gve the documet classfcato Idcates the th word Eglsh vocabular robablt that word osto s w, gve Clearl ot true: The robablt of word learg ma be greater f the recedg word s mache Necessar, wthout t the umber of robablt terms s rohbtve erforms remarabl well deste the correctess of the assumto

Estmatg Lelhood Is roblematc because we eed to estmate t for each combato of tet osto, Eglsh word, ad target value: 48*50,000* 5 mllo such terms. Assumto that reduced the umber of terms Bag of Words Model The robablt of ecouterg a secfc word w s deedet of the secfc word osto. a w am w,, m Istead of estmatg we estmate a sgle term Now we have 50,000* dstct terms. a w, a w,... w

Estmatg Lelhood The estmate for the lelhood s w Vocabular -the total umber of word ostos all trag eamles whose target value s -the umber tmes word w s foud amog these word ostos. Vocabular -the total umber of dstct words foud wth the trag data.

Classf_Nave_Baes_TetDoc ostos all word ostos Doc that cota toes foud Vocabular * Retur arg ma v a v {, } ostos