arameter Estmato
robabltes Notatoal Coveto Mass dscrete fucto: catal letters Desty cotuous fucto: small letters Vector vs. scalar Scalar: la Vector: bold D: small Hgher dmeso: catal Notes a cotuous state of fluctuato utl a toc s fshed may udates R ANN & ML
arameter Estmato Otmal classfer mamzes a ror robablty class-codtoal desty Assumto o correlato tme deedet statstcs R ANN & ML 3
oular Aroaches arametrc: assume a certa arametrc form for w ad estmate the arameters Noarametrc: does ot assume a arametrc form for w ad estmate the desty rofle drectly Boudary: estmate the searato hyerlae hyersurface betwee w ad w R ANN & ML 4
5 R ANN & ML a ror robablty Gve the umbers of occurrece: f umber of samles are large eough the selecto rocess s ot based Caveat: samlg may be based k M M k k k
Class codtoal desty More comlcated ot a sgle umber but a dstrbuto assume a certa form estmate the arameters What form should we assume? May but ths course We use almost eclusvely Gaussa R ANN & ML 6
7 R ANN & ML Gaussa or Normal Scalar case Vector case Ukows class mea ad varace u e N ] [ / T e N d u Σ u Σ Σ μ Gaussa Dstrbuto
oulato u e feature R ANN & ML 8
Why Gaussa Normal? Cetral lmt theorem redcts ormal dstrbuto from IID eermets I realty There are oly two umbers the scalar case mea ad varace to estmate or d + dd+/ d- dmesos Nce mathematcal roertes e.g. Fourer trasform of a Gaussa s a Gaussa. roducts ad summato of Gaussa rema Gaussa Ay lear trasform of a Gaussa s a Gaussa R ANN & ML 9
Trasformato roecto I artcular a whteg trasform ca dagoalze the covarace matr R ANN & ML 0
arameter Estmato Mamum lkelhood estmator arameters have fed but ukow values Bayesa estmator arameters as radom varables wth kow a ror dstrbutos Bayesa estmator allows us to chage the a ror dstrbuto by cororatg measuremets to share the rofle R ANN & ML
Grahcally MLE Bayesa lkelhood arameters R ANN & ML
Gve Mamum Lkelhood Estmator labeled samles observatos a assumed dstrbuto of e arameters samles are draw deedetly from Fd { { arameter that best elas the observatos } e} R ANN & ML 3
4 R ANN & ML MLE Formulato Mamze l log log Or l 0 log 0 Log lkelhood
5 R ANN & ML A Eamle log log log log log u u u e
6 R ANN & ML A Eamle cot. ˆ ˆ ˆ 0 0 u Class mea as samle mea class varace as samle varace ˆ ˆ ˆ ˆ ˆ u e N g - MLE s based!
oulato u u e e co weght R ANN & ML 7
uˆ uˆ e e ˆ 0 If too arrow may samlg ots wll be outsde wdth wth low lkelhood of occurrece If too wde / becomes too small ad reduces the lkelhood of occurrece R ANN & ML 8
A Quck Word o MA MA Mamum a osteror estmator Smlar to MLE wth oe addtoal twst Mamze the log lkelhood l. ad. ror robablty of arameter values f you kow t e.g. the mea s more lkely to be u o wth a ormal dstrbuto MLE has a uform ror MA ot ecessarly The added term s a case of regularzato R ANN & ML 9
Bayesa Estmator Note that MLE s a batch estmator All data have to be ket Dffcult to udate estmato Dffcult to cororate other evdece Isst o a sgle measuremet Bayesa estmator Allow the freedom that arameters themselves ca be radom varables Allow multle evdece Allow teratve udate R ANN & ML 0
R ANN & ML Bayesa Estmator Based o Bayes rule Wth at our dsosal
R ANN & ML Bayes Rule Formulato Assume comes from oly oe class s deedet of
How ca be used? The dstrbuto s kow e.g. ormal the arameters are ukow For estmatg class arameters class arameters the costra ut t all together d R ANN & ML 3
Bayes Rule Formulato cot. d Ideally someˆ 0 otherwse d Ths s MLE! ˆ Otherwse all ossble s are used R ANN & ML 4
Grahc Iterretato { } { e } { e } { ' ' '} { " " "} ' e " e R ANN & ML 5
6 R ANN & ML A eamle Estmatg mea of a ormal dstrbuto Varace s kow Usg samles Frst ste o o k e N u e u o o o k du u u Curret evdece revous ad other evdece Key to Bayesa: Both curret ad ror evdece ca be used
7 R ANN & ML The } { ' o o k k o o o k e e e e o k k k o o o o o o o m f m
hels Defg the mea Reducg the ucertaty mea Trust ew data f Class varace s small Number of samle s large ror s ucerta o m o R ANN & ML o o 8
9 R ANN & ML Secod ste e d u f where f N d e e d u g u } e{ } { A eamle cot. Thrd ste
Grahcal Iterretato: MLE k e k ˆ R ANN & ML 30
3 R ANN & ML Grahcal Iterretato: Bayesa k k e u e u u u k k
Results of Iteratve rocess Start wth a ror dstrbuto Icororate curret batch of data Geerate a ew ror Goodess of ew ror = goodess of old ror * goodess of terretato Usually ror dstrbuto share Bayesa learg Ucertaty dros R ANN & ML 3
MLE vs. Bayes Faster dfferetato Sgle model Kow model Less formato Slow tegrato Multle weghted Ukow model fe More formato ouform ror R ANN & ML 33
Does t really make a dfferece? Yes Bayesa classfer ad MA wll geeral gve dfferet results whe used to classfy ew samles Because MA MLE kees oly oe hyothess whle Bayesa kees multle weghted hyotheses R ANN & ML 34
35 R ANN & ML Eamle MLE Bayesa ma arg ' ' ma arg ' where ' d ma arg 0.3 0.3 0.4 3 3 3.4.3 * 0.3 * 0.4 *.6.3 *.3 *.4 * 0 Oly oe hyothess s ket
Gbbs Samler Bayesa classfer s otmal but ca be very eesve esecally whe a large umber of hyotheses are ket ad evaluated Gbbs radomly ck oe hyothess accordg to the curret osteror dstrbuto Ca be show later to be related k classfer ad the eected error s at most twce as bad as Bayesa R ANN & ML 36
37 R ANN & ML A Eamle: Naïve Bayesa Features are a coucto of attrbutes Bayes theorem states that a osteror robablty should be mamzed Naïve Bayesa classfer assumes deedece of attrbutes ma arg ma arg ma arg c c c c a c a a a c c a a a a a a c c
Eamle Day Outlook Temerature Humdty Wd lay tes D Suy Hot Hgh Weak No D Suy Host Hgh Strog No D3 Overcast Hot Hgh Weak Yes D4 Ra Mld Hgh Weak Yes D5 Ra Cool Normal Weak Yes D6 Ra Cold Normal Strog No D7 Overcast Cool Normal Strog Yes D8 Suy Mld Hgh Weak No D9 Suy Cool Normal Weak Yes D0 Ra Mld Normal Weak Yes D Suy Mld Normal Strog Yes D Overcast Mld Hgh Strog Yes D3 Overcast Hot Normal Weak Yes D4 Ra Mld Hgh Strog No R ANN & ML 38
39 R ANN & ML Eamle cot <Outlook=suy Temerature=cool Humdty=hgh Wd=strog> laytes=yes? Or o? ma arg } { o yes c NB c strog Wd c hgh Humdty c cool e Temeratur c suy Outlook c c.6 5 3.33 9 3.36 4 5.64 4 9 o strog Wd yes strog Wd o laytes yes laytes 0.006 0.0053 o strog o hgh o cool o suy o yes strog yes hgh yes cool yes suy yes
Caveat Guardg agast zero robablty a c Esecally for small samle szes ad large set of attrbute values Use m-estmate stead If attrbute a ca take k values the =/k a a c c :# of samles c :# of samles c m : equvalet samle a c m m wth sze attrbute a add m more samles : ror estmate R ANN & ML 40
More Eamles Web age classfcato/newsgrou classfcato Lke/dslke for web ages Scece/sorts/etertamet categores for web ages/ewsgrous R ANN & ML 4
More Eamles cot. Select commo occurrg words as features at least k tmes documets Elmate sto words the t etc. ad uctuatos Word stemmg lke lked etc. word k class s deedet of word osto the documet Acheve 89% accuracy for classfyg documets for 0 ewsgrous R ANN & ML 4