BAYES CLASSIFIER www.aplysit.om www.ivan.siregar.biz ALYSIT IT SOLUTION CENTER Jl. Ir. H. Duanda 109 Bandung Ivan Mihael Siregar ivan.siregar@gmail.om Data Mining 2010
Bayesian Method Our fous this leture Learning and lassifiation methods based on probability theory. Bayes theorem plays a ritial role in probabilisti learning and lassifiation. Uses priorprobability of eah ategory given no information about an item. Categorization produes a posterior probability distribution over the possible ategories given a desription of an item. 2
Bayes Theorem A : probability of A A B : probability of A given B A B : probability of A and B together where A B A B B We an predit A B if B A A and B are given. Guys ust go to eample on page 13 for quik understanding!!! 3
Basi robability Formulas rodut rule Sum rule A A B B B A B A B A B A B A + 4 Bayes theorem Theorem of total probability if event Ai is mutually elusive and probability sum to 1 n i i A i A B B 1 D h h D D h
Bayes Theorem Given a hypothesis hand data Dwhih bears on the hypothesis: h: independent probability of h: prior probability D: independent probability of D D h: onditional probability of D given h: likelihood h D: onditional probability of hgiven D: posterior probability 5
Does atient Have Caner or Not A patient takes a lab test and the result omes bak positive. It is known that the test returns a orret positive result in only 99% of the ases and a orret negative result in only 95% of the ases. Furthermore only 0.03 of the entire population has this disease. 1. What is the probability that this patient has aner? 2. What is the probability that he does not have aner? 3. What is the diagnosis? 6
Maimum A osterior Based on Bayes Theorem we an ompute the Maimum A osteriorma hypothesis for the data We are interested in the best hypothesis for some spae H given observed training data D. H: set of all hypothesis. h MA MA argma h H argma h H argma h H h D D h h D D h h Note that we an drop D as the probability of the data is onstant and independent of the hypothesis. 7
Maimum Likehood Now assume that all hypotheses are equally probable a priori i.e. hi h for all hi h belong to H. This is alled assuming a uniform prior. It simplifies omputing the posterior: h ML arg ma D h h H This hypothesis is alled the maimum likelihood hypothesis. 8
Desirable roperties of Bayes Classifier Inrementality:with eah training eample the prior and the likelihood an be updated dynamially: fleible and robust to errors. Combines prior knowledge and observed data: prior probability of a hypothesis multiplied with probability of the hypothesis given the training data robabilisti hypothesis:outputs not only a lassifiation but a probability distribution over all lasses 9
Bayes Classifier Assumption: training set onsists of instanes of different lasses desribed as onuntions of attributes values Task: Classify a new instane d based on a tuple of attribute values into one of the lasses C 10 Key idea: assign the most probable lass using Bayes Theorem. argma 2 1 n C MA K argma 2 1 2 1 n n C K K argma 2 1 n C K
arameter Estimation Can be estimated from the frequeny of lasses in the training eamples. 1 2 n O X n C parameters Could only be estimated if a very very large number of training eamples was available. Independene Assumption: attribute values are onditionally independent given the target value: naïve Bayes. n i 1 2 K NB arg ma C i i i 11
roperties Estimating i instead of 1 2 K n greatly redues the number of parameters and the data sparseness. The learning step in Naïve Bayes onsists of estimating i and based on the frequenies in the training data An unseen instane is lassified by omputing the lass that maimizes the posterior When onditioned independene is satisfied Naïve Bayes orresponds to MA lassifiation. 12
Eample: lay Tennis Outlook Temperature Humidity Windy ategorial ategorial binary binary lay CLASS Sunny Hot High False no Sunny Hot High True no Overast Hot High False yes Rainy Mild High False yes Rainy Cool Normal False yes Rainy Cool Normal True no Overast Cool Normal True yes Sunny Mild High False no Sunny Cool Normal False yes Rainy Mild Normal False yes Sunny Mild Normal True yes Overast Mild High True yes Overast Hot Normal False yes Rainy Mild High True no redit lass label for Xoutlooksunny Temperatureool Humadityhigh Windytrue 13
Eample: lay Tennis Outlook Temperature Humidity Windy play Yes No Yes No Yes No Yes No Yes no Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5 Overast 4 0 Mild 4 2 Normal 6 1 True 3 3 Rainy 3 2 Cool 3 1 Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14 Overast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5 Rainy 3/9 2/5 Cool 3/9 1/5 robaility of playyes given X is: yes X X yes X yes X yes X yes yes 1 2 3 X 4 14
Eample: lay Tennis Compare between yes X and no X 2 3 3 3 9 9 9 9 9 14 X 0.0053 X yes X no X 3 1 4 3 5 5 5 5 5 14 X 0.0206 X Beause value of yes Xis greaterthan no X then test reord of X Outlook Sunny Temperature Cool Humidity High Windy true will be lassified as lass label lay tennis No. 15
Referenes 1. Neapolitan Rihard Bayesian Network 2006 16