9/8/07 MIST.6060 Busness Intellgence and Data Mnng Naïve Bayes Classfer Termnology Predctors: the attrbutes (varables) whose values are used for redcton and classfcaton. Predctors are also called nut varables, features, or ndeendent varables. There s no sngle domnant term for the attrbute whose values are to be redcted. In statstcs, t s often called resonse or deendent varable. In the comutng feld, t s called outut, target or outcome attrbute. For classfcaton roblem, t s tycally called class attrbute. In Weka, the term class attrbute s used no matter t s categorcal or numerc. The two tyes of terms above make sense only for suervsed learnng tasks (.e., classfcaton and numerc redcton). A fraud detecton examle: The task s to detect whether a transacton s normal or fraudulent. The exstng (tranng) data ncludes a class attrbute (wth two classes: normal, fraudulent), and two redctors: Transacton Tme (wth two categores: day, nght), and Transacton Amount (wth two categores: small, large). Classfcaton Performance Measures Msclassfcaton error rate: number of msclassfed records error rate total number of records Classfcaton accuracy (rate): number of correctly classfed records accuracy total number of records error rate Classfcaton (Confuson) Matrx The Naïve Rule Classfy a record based on the majorty class. Examle (fraud detecton): In decdng whether a transacton s normal or fraudulent, the tranng data show that the majorty of transactons are normal; so, classfy ths transacton as normal. Wn/loss redcton n sorts. aoba L All Rghts Reserved.
9/8/07 MIST.6060 Busness Intellgence and Data Mnng Condtonal Probablty Condtonal robablty s the robablty of an event C occurrng gven that some event has occurred; t s wrtten as P ( C ). Examle: Toss a de and guess the number aearng on the uer face. The robablty of guessng rght s /6. But f you are told that t s an even number (condton), then the (condtonal) robablty of guessng rght becomes /3. A classfcaton roblem s essentally a roblem of estmatng the condtonal robablty of a class value (C), gven a set of redctor values,,..., ). ( Bayes Theorem n the Context of Classfcaton Let C,...,Cm be m ossble classes. Let,,..., be a set of redctor values of a record, then the robablty that the record belongs to class C s: P( C,..., (,..., ) ( ) ) P C P C P(,..., C ) P( C ) + + P(,..., C ) P( C ), () m m where P ( C ) s called the ror robablty and P ( C,..., ) s called the osteror robablty. Note that the naïve rule mentoned earler smly uses the ror robablty for classfcaton. Naïve Bayes s rmarly used for stuatons where all attrbutes are categorcal (numerc attrbute values are tycally groued nto ntervals). Examle (fraud detecton): The roblem has two ossble classes (m ): C normal, and C fraudulent. There are two redctors ( ): Transacton Tme ( ), and Transacton Amount ( ). The roblem of determnng whether a transacton that occurs durng a nght tme wth a large transacton amount s, n the Bayesan context, to fnd robabltes P ( normal nght, large) and P ( fraudulent nght, large). The decson wll be based on whch robablty s the largest. Bayes Theorem s also called Bayes Rule or Bayes Formula. For more general descrtons on the subject (not requred), see: htt://www.cs.ubc.ca/~murhyk/bayes/bayesrule.html (easer), or htt://en.wkeda.org/wk/bayes'_theorem (harder) aoba L All Rghts Reserved.
9/8/07 MIST.6060 Busness Intellgence and Data Mnng 3 Naïve Bayes Classfer Problems wth the exact Bayes: Consder the rght-hand sde of the Bayes Theorem (). Whle t s easy to estmate the ror robablty P C ), t s comutatonally very exensve to estmate the condtonal robablty P,..., C ) when the number of ( ( redctors and/or the number of categores of some redctors are large or even modestly large. The comutaton nvolves evaluatng all ossble combnatons of the,..., values gven C. Furthermore, some ossble combnatons mght not have any occurrence n the tranng data, makng t dffcult to estmate the robabltes for new (test) records that have such combnatons. Naïve Bayes assumes that the redctors are condtonally ndeendent of each other gven the class value. Under ths assumton, the condtonal robablty can be easly comuted by P(,,..., ) P( ) P( ) P( ). () It turns out that t s not necessary to comute the denomnator art of the rght-hand sde of equaton () (to be exalned later n an examle). So, after substtutng equaton () nto the numerator of the rght-hand sde of equaton (), the comutaton for the osteror robabltes becomes farly easy. A classfcaton model constructed based on ths condtonal ndeendence assumton s called a Naïve Bayes or Smle Bayes classfer. An Illustratve Examle Fraud Detecton The FraudDetect.arff fle: @relaton FraudDetect % dataset name @attrbute TransactonTme {nght,day} % attrbute name & lst of all values @attrbute TransactonAmount {small,large} % attrbute name & lst of all values @attrbute Class {normal,fraudulent} % attrbute name & lst of all values @data nght, small, normal day, small, normal day, large, normal day, large, normal day, small, normal day, small, normal nght, small, fraudulent nght, large, fraudulent day, large, fraudulent nght, large, fraudulent % data start after ths aoba L All Rghts Reserved.
9/8/07 MIST.6060 Busness Intellgence and Data Mnng 4 Consder the frst record, whch has {TransactonTme nght} and {TransactonAmount small}. We frst comute the ror robabltes for the class attrbute: P (Class normal) 6 /0 (6 out of the 0 records are normal), P ( Class fradulent) 4 /0 (4 out of the 0 records are fraudulent). We then comute the condtonal robabltes for {TransactonTme nght}, gven a certan Class value: P (Transact ontme nght Class normal) / 6 ( of 6 normal records has TransactonTme nght), P (Transact ontme nght Class fraudulent) 3/ 4 (3 of 4 fraudulent records have TransactonTme nght). Smlarly, we can obtan the condtonal robabltes for {TransactonAmount small}, gven a certan Class value: P (Transact onamount small Class normal) 4 / 6 (4 of 6 normals are small), P (Transact onamount small Class fraudulent) / 4 ( of 4 fraudulents s small). Fnally, we comute the osteror robabltes based on equatons () and () [substtutng equaton () nto the numerator of the rght-hand sde of equaton ()]: normal TransactonTme nght, TransactonAmount small) [ P(TransactonTme nght Class normal) P(TransactonAmount small Class normal) D normal)] 4 6 ( )( )( )( ) ( )( ) D 6 6 0 D 5 and 3 ( )( )( )( D 4 4 fraudulent TransactonTme 4 0 ) ( )( D 3 40 ) nght, TransactonAmount small) where D s the denomnator n the Bayes formula (); ths s not calculated because t wll be cancelled out when we normalze the osteror robabltes (.e., to scale the robabltes such that they add u to ) as follows: aoba L All Rghts Reserved.
9/8/07 MIST.6060 Busness Intellgence and Data Mnng 5 normal TransactonTme ( )( ) D 5 0.47059, 3 ( )( ) + ( )( ) D 5 D 40 fraudulent TransactonTme 3 ( )( ) D 40 0.594. 3 ( )( ) + ( )( ) D 5 D 40 nght, TransactonAmount small) nght, TransactonAmount small) Based on the estmated robabltes, ths record should be classfed as fraudulent. However, the actual class value of ths record s normal, as shown n the data set. So the Naïve Bayes classfer msclassfes ths record. In fact, the two robabltes are so close to 0.5; thus t s a dffcult decson. Naïve Bayes n Weka. Clck Oen fle, fnd and oen the FraudDetect.arff fle. By default, the last attrbute s the class attrbute.. Clck Classfy / Choose / bayes / NaveBayes. 3. Select Use Tranng set. Clck More otons aoba L All Rghts Reserved.
9/8/07 MIST.6060 Busness Intellgence and Data Mnng 6 4. Clck the Choose button for Outut redctons, and secfy PlanText. Kee the other otons unchanged, and clck OK. 5. Clck Start to get the results. It can be observed that the robabltes estmated n Weka for the frst record are slghtly dfferent from those calculated above by hand. Ths s because Weka ncororates a small number n the Naïve Bayes comutaton to handle the zero robablty case (WFHP,. 99-00). aoba L All Rghts Reserved.