Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu
Bayesan Learnng ombnes pror knowledge wth evdence to make predctons Optmal albet mpractcal classfer Naïve Bayes classfer practcal Assumes ndependence among features Assocaton rule mnng
Bayes Rule P p P p s the class, 1 K s the feature vector of an nstance P = probablty that nstance belongs to class posteror p = probablty that an nstance drawn from class would be lkelhood P = probablty of class pror p = probablty of nstance evdence Thomas Bayes posteror lkelhood pror evdence
Intuton behnd dfferent Probabltes Pror probablty: Knowledge we have as to the value of before lookng at observables Lkelhood probablty: ondtonal probablty that an event belongng to has the assocated observaton value Data tells us regardng the class Evdence: Margnal probablty that an observaton s seen
lassfy nstance as class such that Snce only nterested n mamum, can gnore denomnator p If pror probablty dstrbuton of classes s unform, then can gnore P Bayes lassfer arg ma 1 k K k P arg ma 1 k k K k P p arg ma 1 k K k p posteror p P p P evdence pror lkelhood posteror
Eample p P P p lkelhood pror posteror evdence Suppose you want to propose a grl, and you know the probablty of her sayng yes, gven the grl s above 24 years of age. Now let probablty of the grl beng above 24 years of age be P. So s the event of the grl beng older than 24 years old. What s ths probablty P? A Lkelhood B Posteror Evdence D Pror Snce ths nformaton s known to you beforehand, ths s pror.
Eample P p P p Let Y be the event of a grl sayng yes to you. So PY s the probablty of a grl no age constrant here, sayng yes to you. What s ths probablty? A Lkelhood B Posteror Evdence Ths s called evdence, smply because you get to see the results of ths, or you wtness ths event happenng. Hence, ths s evdence to you.
Eample P p P p PY/ s the probablty of a grl sayng yes to you, gven she s older than 24 years. Or how lkely s t for a grl older than 24, to say yes to you. What form of probablty s ths? A Lkelhood B Posteror Evdence It s lkelhood
Eample P p P p P/Y s the probablty of observng a grl beng greater than 24 years old, gven she has already sad yes to you. What form of probablty s ths? A Lkelhood B Posteror Evdence Snce you can not know ths nformaton, wthout proposng to the grl frst, ths s called posteror Ths reflect the dfferent concepts of Bayes theorem: posteror = lkelhood * pror / evdence
Bayes lassfer Practcal ssue p s a jont probablty dstrbuton Need to know the probablty of every possble nstance gven every possble class Even for D boolean features and K classes, that s K*2 D probabltes Soluton Assume features are ndependent of each other p,,..., p 1 2 D j j 1 D
Gven tranng set Estmate probabltes from lassfy new nstance as class such that Naïve Bayes lassfer * arg ma 1 1 k j D j k K k p P }, { r r P }, { }, { j j r r v and r r v p
Naïve Bayes lassfer Another practcal ssue What f j s a contnuous feature? Soluton #1 Assume some parameterzed dstrbuton for j E.g., normal Learn parameters of dstrbuton from data E.g., mean and varance of j values Soluton #2 Dscretze feature E.g., prce R to prce {low, medum, hgh}
Yet another practcal ssue What f no eamples n class have j = v? Soluton Naïve Bayes lassfer 0 * 1 j D j p P 0 j v p }, { 1 }, { j j j doman r r v and r r v p
Naïve Bayes lassfer Independence assumpton rarely true E.g., s prce ndependent from engne power? Naïve Bayes classfer stll does surprsngly well Smple, effectve baselne for other learners
Accuracy Sdebar: The Learnng urve 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Dvde data nto tranng and testng Learn hypothess on ncreasng percentages of tranng data ompute accuracy of each hypothess on testng data Plot accuracy versus percentage of tranng data used Interested n convergence rate and plateau 0 0 10 20 30 40 50 60 70 80 90 100 Percent Tranng
Accuracy Learnng urve 1 NaveBayes vs. OneR on Labor Data 2/3-1/3 splt 0.9 0.8 0.7 0.6 0.5 0.4 0.3 NaveBayes OneR 0.2 0.1 0 10 20 30 40 50 60 70 80 90 100 Percent Tranng
Assocaton Rules Assocaton rule: Y People who buy/clck/vst/enjoy are also lkely to buy/clck/vst/enjoy Y A rule mples assocaton, not necessarly causaton
Assocaton measures Support Y: onfdence Y: Lft Y: customers and customers who bought # #, Y Y P Y P Y P Y P who bought customers and who bought customers # #,, P Y P Y P Y P Y P
Eample Gven a set of transactons, fnd rules that wll predct the occurrence of an tem based on the occurrences of other tems n the transacton Market Based Transactons TID Items 1 Bread, Mlk 2 Bread, Daper, Beer, Eggs 3 Mlk, Daper, Beer, oke 4 Bread, Mlk, Daper, Beer 5 Bread, Mlk, Daper, oke {Daper} {Beer}, {Mlk, Bread} {Eggs, oke}, {Beer, Bread} {Mlk},
Eample Assocaton Rule An mplcaton epresson of the form Y, where and Y are temsets Eample: {Mlk, Daper} {Beer} Rule Evaluaton Metrcs Support s Fracton of transactons that contan both and Y onfdence c Measures how often tems n Y appear n transactons that contan
Sgnfcance of Assocaton measures onfdence ondtonal probablty Value should be close to 1 Strength of the rule, rule holds enough confdence Support Statstcal sgnfcance of the rule Strong confdence value but # of such customers s small, rule s worthless Mnmum support and confdence are set by the user/entty Rules wth hgher support and confdence are searched for n the database
Assocaton Rules In general, and Y can be a sets of tems Basket analyss E.g., customers buyng hot dogs and buns are more lkely to buy mustard and catsup Assocaton rule mnng Gven database of customer purchases Fnd all assocaton rules wth hgh support and confdence Apror algorthm [Agrawal et al., 1996]
Apror Algorthm Agrawal et al., 1996 For,Y,Z, a 3-tem set, to be frequent have enough support,,y,,z, and Y,Z should be frequent If,Y s not frequent, none of ts supersets can be frequent Once we fnd the frequent k-tem sets, we convert them to rules:, Y Z,... and Y, Z,...
Apror Algorthm Fnd all temsets wth enough support If temset of sze k does not have enough support, then no superset of ths temset wll have enough support For each temset, fnd all assocaton rules Y wth enough confdence Rules of the form {A,B} {,D} can only be confdent f both {A,B,} {D} and {A,B,D} {} are confdent WEKA: Assocate
Summary: Bayesan Learnng Optmal learnng framework Incorporates background knowledge Practcal algorthms Naïve Bayes Assocaton rule mnng
One- Mnute Learnng What s Bayes Theorem? Provdes a way to calculate the probablty of a hypothess based on ts pror probabltes, the probabltes of observng varous data gven the hypothess, and the observed data tself What s dfference between posteror and pror probablty?.e., p or phd Posteror probablty reflects our confdence that hypothess holds after we have seen the tranng data It reflects the nfluence of the tranng data, n contrast to the pror probablty whch s ndependent of data
Practce Problem Does patent have cancer or not? A patent takes a lab test and results come back postve. The test returns a correct postve result n only 98% of the cases n whch the dsease s actually present, and a correct negatve results n only 97% of the cases n whch the dsease s actually not present. Furthermore, 0.008 of the entre populaton have ths cancer. Pcancer = P cancer = P+cancer = P --cancer = P+ cancer = P-- cancer = Hnts: Fnd the MAP hypothess. P+cancer. Pcancer = P+ cancer. P cancer =
Practce Problem Does patent have cancer or not? A patent takes a lab test and results come back postve. The test returns a correct postve result n only 98% of the cases n whch the dsease s actually present, and a correct negatve results n only 97% of the cases n whch the dsease s actually not present. Furthermore, 0.008 of the entre populaton have ths cancer. Pcancer = 0.008 P cancer = 0.992 P+cancer = 0.98 P --cancer =0.02 P+ cancer = 0.03 P-- cancer = 0.97 Hnts: Fnd the MAP hypothess. P+cancer. Pcancer =.98.008 =.0078 P+ cancer. P cancer =.03.992 =.0298. Thus, h MAP = cancer