Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Baa Classfcato CS6L Data Mg: Classfcato() Referece: J. Ha ad M. Kamber, Data Mg: Cocepts ad Techques robablstc learg: Calculate explct probabltes for hypothess, amog the most practcal approaches to certa types of learg problems Icremetal: Each trag example ca cremetally crease/decrease the probablty that a hypothess s correct. ror kowledge ca be combed wth observed data. robablstc predcto: redct multple hypotheses, weghted by ther probabltes Stadard: Eve whe Baa methods are computatoally tractable, they ca provde a stadard of optmal decso makg agast whch other methods ca be measured Baa Theorem: Bascs Let X be a data sample whose class label s ukow Let H be a hypothess that X belogs to class C For classfcato problems, determe (H/X): the probablty that the hypothess holds gve the observed data sample X (H): pror probablty of hypothess H (.e. the tal probablty before we observe ay data, reflects the backgroud kowledge) (X): probablty that sample data s observed (X H) : probablty of observg the sample X, gve that the hypothess holds Baa Theorem Gve trag data X, posteror probablty of a hypothess H, (H X) follows the Ba theorem Iformally, ths ca be wrtte as posteror lkelhood x pror / evdece MA (maxmum posteror) hypothess ractcal dffculty: requre tal kowledge of may probabltes, sgfcat computatoal cost Naïve Ba Classfer A smplfed assumpto: attrbutes are codtoally depedet: ( X C) ( xk C) k The product of occurrece of say elemets x ad x, gve the curret class s C, s the product of the probabltes of each elemet take separately, gve the same class ([y,y ],C) (y,c) * (y,c) No depedece relato betwee attrbutes Greatly reduces the computato cost, oly cout the class dstrbuto. Oce the probablty (X C ) s kow, assg X to the class wth maxmum (X C )*(C ) Class: C:buys_computer C:buys_computer o Data sample X (age<0, Icomemedum, Studet Credt_ratg Far) Trag dataset age come studet credt_ratg buys_computer <0 hgh o far o <0 hgh o excellet o 0 40 hgh o far >40 medum o far >40 low far >40 low excellet o 40 low excellet <0 medum o far o <0 low far >40 medum far <0 medum excellet 40 medum o excellet 40 hgh far >40 medum o excellet o

Naïve Baa Classfer: Example Compute (X/C) for each class (age <0 buys_computer ) /. (age <0 buys_computer o ) / 0.6 (come medum buys_computer ) 4/ 0.444 (come medum buys_computer o ) / 0.4 (studet buys_computer ) 6/ 0.667 (studet buys_computer o ) /0. (credt_ratg far buys_computer )6/.667 (credt_ratg far buys_computer o )/0.4 X(age<0,come medum, studet,credt_ratgfar) (X C) : (X buys_computer ) 0. x 0.444 x 0.667 x 0.0.667 0.044 (X buys_computer o ) 0.6 x 0.4 x 0. x 0.4 0.0 (X C)*(C ) : (X buys_computer ) * (buys_computer )0.08 (X buys_computer ) * (buys_computer )0.007 X belogs to class buys_computer Naïve Ba: Cotuous Value () Temp(ºF) Humdty Wdy Class suy Do't suy Do't 8 86 ray 6 ray 68 ray Do t suy 7 Do t suy 6 ray suy 7 8 ray 7 Do t Naïve Ba: Cotuous Value () Naïve Ba: Cotuous Value () temperature humdty wdy play temperature humdty wdy play o o o o o suy 4 0 8 86 6 6 ye s o o o o o ray 68 7 suy / / µ 7 74.6 µ 7. 86. 6/ / /4 /4 6 7 4/ 0/ σ 6. 7. σ 0..7 / / 7 ray / / 8 µ: mea ad σ stadard devato Naïve Ba: Cotuous Value (4) Naïve Ba: Cotuous Value () suy Temp 66 Humdty Wdy Class? f ( Temp 66 Class ) e π (6.) ( (7)) x (6.) 0.040 (Class suy Temp 66 Humdty Wdy ) f ( Humdty Class ) 0.0 (Class o suy Temp 66 Humdty Wdy ) Gaussa (Normal) Desty Fucto f ( x µ ) σ ( ) x e πσ 0.040 0.0 4 ( suy Temp 66 Humdty Wdy ) 0.00006 ( suy Temp cool Humdty hgh Wdy )

Naïve Ba: Cotuous Value (6) 0.0 0.0 4 o ( suy Temp 66 Humdty Wdy ) 0.0006 ( suy Temp cool Humdty hgh Wdy ) o 0.% 7.% Naïve Baa Classfer: Evaluato Advatages : Easy to mplemet Good results obtaed most of the cases Dsadvatages Assumpto: class codtoal depedece, therefore loss of accuracy ractcally, depedeces exst amog varables. E.g., hosptals: patets: rofle: age, famly hstory etc Symptoms: fever, cough etc., Dsease: lug cacer, dabetes etc Depedeces amog these caot be modeled by Naïve Baa Classfer How to deal wth these depedeces? Baa Belef Networks Baa Networks Baa Belef Network: A Example Baa belef etwork allows a subset of the varables codtoally depedet A graphcal model of causal relatoshps Represets depedecy amog the varables Gves a specfcato of jot probablty dstrbuto X Z Nodes: radom varables Lks: depedecy X, are the parets of Z, ad s the paret of No depedecy betwee Z ad Has o loops or cycles Famly Hstory LugCacer ostvexray Smoker Emphysema Dyspea Baa Belef Networks LC ~LC (FH, S) (FH, ~S) (~FH, S) (~FH, ~S) 0.8 0. 0. 0. 0.7 0. 0. 0. The codtoal probablty table for the varable LugCacer: Shows the codtoal probablty for each possble combato of ts parets ( z,..., z) ( z arets ( Z )) Learg Baa Networks Several cases Gve both the etwork structure ad all varables observable: lear oly the CTs Network structure kow, some hdde varables: method of gradet descet, aalogous to eural etwork learg Network structure ukow, all varables observable: search through the model space to recostruct graph topology Ukow structure, all hdde varables: o good algorthms kow for ths purpose redcto redcto s smlar to classfcato Frst, costruct a model Secod, use model to predct ukow value Major method for predcto s regresso Lear ad multple regresso No-lear regresso redcto s dfferet from classfcato Classfcato refers to predct categorcal class label redcto models cotuous-valued fuctos

redctve Modelg Databases redctve modelg: redct data values or costruct geeralzed lear models based o the database data. Oe ca oly predct value rages or category dstrbutos Method outle: Mmal geeralzato Attrbute relevace aalyss Geeralzed lear model costructo redcto Determe the major factors whch fluece the predcto Data relevace aalyss: ucertaty measuremet, etropy aalyss, expert judgmet, etc. Mult-level predcto: drll-dow ad roll-up aalyss Regress Aalyss ad Log-Lear Models redcto Lear regresso: α + β X Two parameters, α ad β specfy the le ad are to be estmated by usg the data at had. usg the least squares crtero to the kow values of,,, X, X,. Multple regresso: b0 + b X + b X. May olear fuctos ca be trasformed to the above. Log-lear models: The mult-way table of jot probabltes s approxmated by a product of lower-order tables. robablty: p(a, b, c, d) αab βacχad δbcd Regresso Lear regresso Method of Least Squares rcple: Mmze [ ˆ (X ) - ] Assume ˆ (X ) Mmze [ -(a+bx )] a + bx (lear relatoshp) Mmzg w.r.t. a a + b Mmzg w.r.t. b X X a X + b X () () Lear regresso Solvg () ad () we get: b X X X ( X ) a ( b X ) X (years) 8 6 6 Lear Regresso: Example (Salary) 0 7 7 6 4 0 8 X X. y. 4 X X ( X X )( ) b X ( X ) ( X X ) a b X (.)(0.4)+(8.)(7.4) +(6.)(8.4) b ------------------------------------------------------------------------- (.) + (8.) + + (6.) a.4 (.)(.).6

Lear regresso Determe lear regresso le, wth learly depedet o X