Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Size: px

Start display at page:

Download "Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty"

Lindsey Booker
5 years ago
Views:

1 Bayes Classificatio Ucertaity & robability Baye's rule Choosig Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's cocept learig Maximum Likelihood of real valued fuctio Bayes optimal Classifier Joit distributios Naive Bayes Classifier 1

2 Ucertaity Our mai tool is the probability theory, which assigs to each setece umerical degree of belief betwee 0 ad 1 It provides a way of summarizig the ucertaity Variables Boolea radom variables: cavity might be true or false Discrete radom variables: weather might be suy, raiy, cloudy, sow Weather=suy Weather=raiy Weather=cloudy Weather=sow Cotiuous radom variables: the temperature has cotiuous values 2

3 Where do probabilities come from? Frequets: From experimets: form ay fiite sample, we ca estimate the true fractio ad also calculate how accurate our estimatio is likely to be Subjective: Aget s believe Objectivist: True ature of the uiverse, that the probability up heads with probability 0.5 is a probability of the coi Before the evidece is obtaied; prior probability a the prior probability that the propositio is true cavity=0.1 After the evidece is obtaied; posterior probability ab The probability of a give that all we kow is b cavitytoothache=0.8 3

4 Axioms of robability Kolmogorov s axioms, first published i Germa 1933 All probabilities are betwee 0 ad 1. For ay propositio a 0 a 1 true=1, false=0 The probability of disjuctio is give by a b = a + b a b roduct rule a b = a b b a b = b a a 4

5 Theorem of total probability If evets A 1,..., A are mutually exclusive with the Bayes s rule Reveret Thomas Bayes He set dow his fidigs o probability i "Essay Towards Solvig a roblem i the Doctrie of Chaces" 1763, published posthumously i the hilosophical Trasactios of the Royal Society of Lodo b a = a bb a 5

6 How ca we arrive at Bayes s rule? For ay fiite sample, we ca estimate the true fractio ad also calculate how accurate our estimatio is likely to be. By usig samples, as i most physical measuremets, we estimate the values. This approach is called frequetist. We approach the true value by coutig the frequecy of a evet. Coditioal probability If Ω is the set of all possible evets, Ω = 1, the a Ω. The cardiality determies the umber of elemets of a set, cardω is the umber of elemets of the set Ω, carda is the umber of elemets of the set a 6

7 Baye s rule 7

8 8 Diagosis What is the probability of meigitis i the patiet with stiff eck? A doctor kows that the disease meigitis causes the patiet to have a stiff eck i 50% of the time -> sm rior robabilities: That the patiet has meigitis is 1/ > m That the patiet has a stiff eck is 1/20 -> s m s = 0.5* = m s = s mm s Normalizatio x y y x x y x y y x x y = = 0.6, ,0.08, 1 = = + = α α α x y x y Y Y X X Y x y x y

9 Bayes Theorem h = prior probability of hypothesis h D = prior probability of traiig data D hd = probability of h give D Dh = probability of D give h Choosig Hypotheses Geerally wat the most probable hypothesis give the traiig data Maximum a posteriori hypothesis h MA : 9

10 If assume h i =h j for all h i ad h j, the ca further simplify, ad choose the Maximum likelihood ML hypothesis 10

11 Example Does patiet have cacer or ot? A patiet takes a lab test ad the result comes back positive. The test returs a correct positive result + i oly 98% of the cases i which the disease is actually preset, ad a correct egative result - i oly 97% of the cases i which the disease is ot preset Furthermore, of the etire populatio have this cacer Suppose a positive result + is retured... 11

12 Normalizatio cacer + = = cacer + = = The result of Bayesia iferece depeds strogly o the prior probabilities, which must be available i order to apply the method Brute-Force Bayes Cocept Learig For each hypothesis h i H, calculate the posterior probability Output the hypothesis h MA with the highest posterior probability 12

13 Give o prior kowledge that oe hypothesis is more likely tha aother, what values should we specify for h? What choice shall we make for Dh? Hypothesis geerates data... Choose h to be uiform distributio for all h i H how ofte is h preset? Frequecy... Dh=1 if h cosistet with D Dh=0 otherwise 13

14 D D = D = D h i h i h i H 1 1 H H h i VS H,D D = VS H,D H h i VS H,D Versio space VS H,D is the subset of cosistet Hypotheses from H with the traiig examples i D Hypothesis geerates this data... if h is icosistet with D h D = 1 1 H VS H,D H = 1 VS H,D if h is cosistet with D 14

15 Hypothesis geerates this data... Maximum Likelihood of real valued fuctio.. see EM Clusterig 15

16 Maximize atural log of this istead... Bayes optimal Classifier A weighted majority classifier What is the most probable classificatio of the ew istace give the traiig data? The most probable classificatio of the ew istace is obtaied by combiig the predictio of all hypothesis, weighted by their posterior probabilities If the classificatio of ew example ca take ay value v j from some set V, the the probability v j D that the correct classificatio for the ew istace is v j, is just remember, have to calculate h i D : 16

17 Bayes optimal classificatio: Gibbs Algorithm Bayes optimal classifier provides best result, but ca be expesive if may hypotheses we have to compute the posterior probability for every hypothesis Gibbs algorithm: Choose oe hypothesis at radom, accordig to hd Use this to classify ew istace Surprisigly its expected error o worse tha twice Bayes optimal Haussler et al

18 Joit distributio A joit distributio for toothache, cavity, catch, detist s probe catches i my tooth :- We eed to kow the coditioal probabilities of the cojuctio of toothache ad cavity What ca a detist coclude if the probe catches i the achig tooth? toothache catch cavitycavity cavity toothache catch = toothache cavity For possible variables there are 2 possible combiatios Coditioal Idepedece Oce we kow that the patiet has cavity we do ot expect the probability of the probe catchig to deped o the presece of toothache catch cavity toothache = catch cavity toothache cavity catch = toothache cavity Idepedece betwee a ad b a b = a b a = b 18

19 a b = a b toothache, catch, cavity, Weather = cloudy = = Weather = cloudy toothache, catch, cavity The decompositio of large probabilistic domais ito weakly coected subsets via coditioal idepedece is oe of the most importat developmets i the recet history of AI This ca work well, eve the assumptio is ot true! A sigle cause directly ifluece a umber of effects, all of which are coditioally idepedet cause, effect1, effect2,... effect = cause effecti cause i= 1 19

20 Naive Bayes Classifier Alog with decisio trees, eural etworks, earest eighbor, oe of the most practical learig methods Whe to use: Moderate or large traiig set available Attributes that describe istaces are coditioally idepedet give classificatio Successful applicatios: Diagosis Classifyig text documets Naive Bayes Classifier Assume target fuctio f: X è V, where each istace x described by attributes a 1, a 2.. a Most probable value of fx is: 20

21 v NB Naive Bayes assumptio: which gives Naive Bayes Algorithm For each target value v j ç estimate v j For each attribute value a i of each attribute a ç estimate a i v j 21

22 Traiig dataset Class: C1:buys_computer= yes C2:buys_computer= o Data sample: X = age<=30, Icome=medium, Studet=yes Credit_ratig=Fair age icome studet credit_ratig buys_computer <=30 high o fair o <=30 high o excellet o high o fair yes >40 medium o fair yes >40 low yes fair yes >40 low yes excellet o low yes excellet yes <=30 medium o fair o <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellet yes medium o excellet yes high yes fair yes >40 medium o excellet o Naïve Bayesia Classifier: Example Compute XC i for each class age= <30 buys_computer= yes = 2/9=0.222 age= <30 buys_computer= o = 3/5 =0.6 icome= medium buys_computer= yes = 4/9 =0.444 icome= medium buys_computer= o = 2/5 = 0.4 studet= yes buys_computer= yes= 6/9 =0.667 studet= yes buys_computer= o = 1/5=0.2 credit_ratig= fair buys_computer= yes =6/9=0.667 credit_ratig= fair buys_computer= o =2/5=0.4 buys_computer= yes =9/14 buys_computer= o =5/14 X=age<=30,icome =medium, studet=yes,credit_ratig=fair XC i : Xbuys_computer= yes = x x x =0.044 Xbuys_computer= o = 0.6 x 0.4 x 0.2 x 0.4 =0.019 XC i *C i : Xbuys_computer= yes * buys_computer= yes =0.028 Xbuys_computer= o * buys_computer= o =0.007 X belogs to class buys_computer=yes 22

23 Coditioal idepedece assumptio is ofte violated...but it works surprisigly well ayway Estimatig robabilities We have estimated probabilities by the fractio of times the evet is observed to c occur over the total umber of opportuities It provides poor estimates whe c is very small If oe of the traiig istaces with target value v j have attribute value a i? c is 0 23

24 Whe c is very small: is umber of traiig examples for which v=v j c umber of examples for which v=v j ad a=a i p is prior estimate m is weight give to prior i.e. umber of ``virtual'' examples Naïve Bayesia Classifier: Commets Advatages : Easy to implemet Good results obtaied i most of the cases Disadvatages Assumptio: class coditioal idepedece, therefore loss of accuracy ractically, depedecies exist amog variables E.g., hospitals: patiets: rofile: age, family history etc Symptoms: fever, cough etc., Disease: lug cacer, diabetes etc Depedecies amog these caot be modeled by Naïve Bayesia Classifier How to deal with these depedecies? Bayesia Belief Networks 24

Bayesian Belief Network

Bayesian Belief Network Bayesia Belief Network Ucertaity & Probability Baye's rule Choosig Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's cocept learig Maximum Likelihood of real valued fuctio Bayes optimal Classifier