7 Classification: Naïve Bayes Classifier

Size: px

Start display at page:

Download "7 Classification: Naïve Bayes Classifier"

Randall Atkinson
6 years ago
Views:

1 CSE4334/5334 Data Mining 7 Classifiation: Naïve Bayes Classifier Chengkai Li Department of Computer Siene and Engineering University of Texas at rlington Fall 017 Slides ourtesy of ang-ning Tan, Mihael Steinbah and Vipin Kumar

2 Bayes Classifier probabilisti framework for solving lassifiation problems Conditional robability: Bayes theorem:,, C C C C C C C C

3 Example of Bayes Theorem 3 Given: o dotor knows that meningitis auses stiff nek 50% of the time o rior probability of any patient having meningitis is 1/50,000 o rior probability of any patient having stiff nek is 1/0 If a patient has stiff nek, what s the probability he/she has meningitis? S M M 0.5 1/ M S S 1/

4 Bayesian Classifiers Consider eah attribute and lass label as random variables Given a reord with attributes 1,,, n o Goal is to predit lass C o Speifially, we want to find the value of C that maximizes C 1,,, n Can we estimate C 1,,, n diretly from data? 4

5 Bayesian Classifiers pproah: o ompute the posterior probability C 1,,, n for all values of C using the Bayes theorem 5 C! 1 1! n C C! o Choose value of C that maximizes C 1,,, n o Equivalent to hoosing value of C that maximizes 1,,, n C C How to estimate 1,,, n C? n 1 n

6 Naïve Bayes Classifier ssume independene among attributes i when lass is given: o 1,,, n C 1 C j C j n C j o Can estimate i C j for all i and C j. o New point is lassified to C j if C j i C j is maximal. 6 6

7 10 7 How to Estimate robabilities from Data? Tid Refund Marital Status Taxable Inome 1 Yes Single 15K No No Married 100K No 3 No Single 70K No 4 Yes Married 10K No Evade 5 No Divored 95K Yes 6 No Married 60K No 7 Yes Divored 0K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Class: C N /N o e.g., No 7/10, Yes 3/10 For disrete attributes: i C k ik / N o where ik is number of instanes having attribute i and belongs to lass C k o Examples: StatusMarried No 4/7 RefundYes Yes0

8 How to Estimate robabilities from Data? For ontinuous attributes: o Disretize the range into bins and thus transform the attribute into an ordinal attribute. o robability density estimation: o ssume attribute follows a normal distribution o Use data to estimate parameters of distribution e.g., mean and standard deviation o One probability distribution is known, an use it to estimate the onditional probability i 8

9 10 9 How to Estimate robabilities from Data? Tid Refund Marital Status Taxable Inome 1 Yes Single 15K No No Married 100K No 3 No Single 70K No 4 Yes Married 10K No Evade 5 No Divored 95K Yes 6 No Married 60K No 7 Yes Divored 0K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Inome 10 No Normal distribution: ps ij o One for eah i, j pair For Inome, ClassNo: o If ClassNo o sample mean 110 o sample variane 975 p i - e j e - i s ij ij -µ 0.007

Example of Naïve Bayes Classifier Given a Test Reord: X ClassNo RefundNo ClassNo Married ClassNo Inome10K ClassNo 4/7 4/7 0.007 0.

10 Example of Naïve Bayes Classifier Given a Test Reord: X ClassNo RefundNo ClassNo Married ClassNo Inome10K ClassNo 4/7 4/ X ClassYes RefundNo ClassYes Married ClassYes Inome10K ClassYes Sine X NoNo > X YesYes Therefore No X > Yes X > Class No X Refund No,Married,Inome 10K 10

11 Naïve Bayes Classifier 11 If one of the onditional probability is zero, then the entire expression beomes zero robability estimation: Original : Laplae : i i C C m - estimate : i N N N N i i C N N i + mp + m : number of lasses p: prior probability m: parameter

12 Example of Naïve Bayes Classifier Name Give Birth Can Fly Live in Water Have Legs Class human yes no no yes mammals python no no no no non-mammals salmon no no yes no non-mammals whale yes no yes no mammals frog no no sometimes yes non-mammals komodo no no no yes non-mammals bat yes yes no yes mammals pigeon no yes no yes non-mammals at yes no no yes mammals leopard shark yes no yes no non-mammals turtle no no sometimes yes non-mammals penguin no no sometimes yes non-mammals porupine yes no no yes mammals eel no no yes no non-mammals salamander no no sometimes yes non-mammals gila monster no no no yes non-mammals platypus no no no yes mammals owl no yes no yes non-mammals dolphin yes no yes no mammals eagle no yes no yes non-mammals 1 Give Birth Can Fly Live in Water Have Legs Class yes no yes no? : attributes M: mammals N: non-mammals 6 6 M N M M N N MM > NN > Mammals

13 Naïve Bayes Summary Robust to isolated noise points Handle missing values by ignoring the instane during probability estimate alulations Robust to irrelevant attributes Independene assumption may not hold for some attributes o Use other tehniques suh as Bayesian Belief Networks BBN 13

Advanced classifica-on methods

Advanced classifica-on methods Instance-based classifica-on Bayesian classifica-on Instance-Based Classifiers Set of Stored Cases Atr1... AtrN Class A B B C A C B Store the training records Use training