Bayesia Belief Network Ucertaity & Probability Baye's rule Choosig Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's cocept learig Maximum Likelihood of real valued fuctio Bayes optimal Classifier Joit distributios Naive Bayes Classifier 1
Ucertaity Our mai tool is the probability theory, which assigs to each setece umerical degree of belief betwee 0 ad 1 It provides a way of summarizig the ucertaity Variables Boolea radom variables: cavity might be true or false Discrete radom variables: weather might be suy, raiy, cloudy, sow Weather=suy) Weather=raiy) Weather=cloudy) Weather=sow) Cotiuous radom variables: the temperature has cotiuous values 2
Where do probabilities come from? Frequets: From experimets: form ay fiite sample, we ca estimate the true fractio ad also calculate how accurate our estimatio is likely to be Subjective: Aget s believe Objectivist: True ature of the uiverse, that the probability up heads with probability 0.5 is a probability of the coi Before the evidece is obtaied; prior probability a) the prior probability that the propositio is true cavity)=0.1 After the evidece is obtaied; posterior probability a b) The probability of a give that all we kow is b cavity toothache)=0.8 3
Axioms of Probability (Kolmogorov s axioms, first published i Germa 1933) All probabilities are betwee 0 ad 1. For ay propositio a 0 a) 1 true)=1, false)=0 The probability of disjuctio is give by a b) = a) + b) a b) Product rule a b) = a b) b) a b) = b a) a) 4
Theorem of total probability If evets A 1,..., A are mutually exclusive with the Bayes s rule (Reveret Thomas Bayes 1702-1761) He set dow his fidigs o probability i "Essay Towards Solvig a Problem i the Doctrie of Chaces" (1763), published posthumously i the Philosophical Trasactios of the Royal Society of Lodo b a) = a b)b) a) 5
6 Diagosis What is the probability of meigitis i the patiet with stiff eck? A doctor kows that the disease meigitis causes the patiet to have a stiff eck i 50% of the time -> s m) Prior Probabilities: That the patiet has meigitis is 1/50.000 -> m) That the patiet has a stiff eck is 1/20 -> s) m s) = 0.5 *0.00002 0.05 = 0.0002 m s) = s m)m) s) Normalizatio ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( x P y P y x P x y P x P y P y x P x y P = = 0.6,0.4 0.12,0.08 ) ( ), ( ) ( ) ( ) ( ) ( ) ( 1 = = + = α α α x y P x y P Y P Y X P X Y P x y P x y P
Bayes Theorem h) = prior probability of hypothesis h D) = prior probability of traiig data D h D) = probability of h give D D h) = probability of D give h Choosig Hypotheses Geerally wat the most probable hypothesis give the traiig data Maximum a posteriori hypothesis h MAP : 7
If assume h i )=h j ) for all h i ad h j, the ca further simplify, ad choose the Maximum likelihood (ML) hypothesis 8
Example Does patiet have cacer or ot? A patiet takes a lab test ad the result comes back positive. The test returs a correct positive result (+) i oly 98% of the cases i which the disease is actually preset, ad a correct egative result (-) i oly 97% of the cases i which the disease is ot preset Furthermore, 0.008 of the etire populatio have this cacer Suppose a positive result (+) is retured... 9
Normalizatio 0.0078 cacer +) = 0.0078 + 0.0298 = 0.20745 cacer +) = 0.0298 0.0078 + 0.0298 = 0.79255 The result of Bayesia iferece depeds strogly o the prior probabilities, which must be available i order to apply the method Joit distributio A joit distributio for toothache, cavity, catch, detist s probe catches i my tooth :-( We eed to kow the coditioal probabilities of the cojuctio of toothache ad cavity What ca a detist coclude if the probe catches i the achig tooth? toothache catch cavity)cavity) cavity toothache catch) = toothache cavity) For possible variables there are 2 possible combiatios 10
Coditioal Idepedece Oce we kow that the patiet has cavity we do ot expect the probability of the probe catchig to deped o the presece of toothache catch cavity toothache) = catch cavity) toothache cavity catch) = toothache cavity) Idepedece betwee a ad b a b) = a) b a) = b) a b) = a) b) toothache, catch, cavity, Weather = cloudy) = = Weather = cloudy) toothache, catch, cavity) The decompositio of large probabilistic domais ito weakly coected subsets via coditioal idepedece is oe of the most importat developmets i the recet history of AI This ca work well, eve the assumptio is ot true! 11
A sigle cause directly ifluece a umber of effects, all of which are coditioally idepedet cause, effect1, effect2,... effect) = cause) effecti cause) i= 1 Naive Bayes Classifier Assume target fuctio f: X è V, where each istace x described by attributes a 1, a 2.. a Most probable value of f(x) is: 12
v NB Naive Bayes assumptio: which gives Naive Bayes Algorithm For each target value v j ç estimate v j ) For each attribute value a i of each attribute a ç estimate a i v j ) 13
Traiig dataset Class: C1:buys_computer= yes C2:buys_computer= o Data sample: X = (age<=30, Icome=medium, Studet=yes Credit_ratig=Fair) age icome studet credit_ratig buys_computer <=30 high o fair o <=30 high o excellet o 30 40 high o fair yes >40 medium o fair yes >40 low yes fair yes >40 low yes excellet o 31 40 low yes excellet yes <=30 medium o fair o <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellet yes 31 40 medium o excellet yes 31 40 high yes fair yes >40 medium o excellet o Naïve Bayesia Classifier: Example Compute X C i ) for each class age= <30 buys_computer= yes ) = 2/9=0.222 age= <30 buys_computer= o ) = 3/5 =0.6 icome= medium buys_computer= yes )= 4/9 =0.444 icome= medium buys_computer= o ) = 2/5 = 0.4 studet= yes buys_computer= yes)= 6/9 =0.667 studet= yes buys_computer= o )= 1/5=0.2 credit_ratig= fair buys_computer= yes )=6/9=0.667 credit_ratig= fair buys_computer= o )=2/5=0.4 buys_computer= yes )=9/ 14 buys_computer= o )=5/ 14 X=(age<=30,icome =medium, studet=yes,credit_ratig=fair) X C i ) : X buys_computer= yes )= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044 X buys_computer= o )= 0.6 x 0.4 x 0.2 x 0.4 =0.019 X C i )*C i ) : X buys_computer= yes ) * buys_computer= yes )=0.028 X buys_computer= o ) * buys_computer= o )=0.007 X belogs to class buys_computer=yes 14
Coditioal idepedece assumptio is ofte violated...but it works surprisigly well ayway Naive Bayes assumptio of coditioal idepedece too restrictive But it's itractable without some such assumptios... Bayesia Belief etworks describe coditioal idepedece amog subsets of variables allows combiig prior kowledge about (i)depedecies amog variables with observed traiig data 15
Bayesia etworks A simple, graphical otatio for coditioal idepedece assertios ad hece for compact specificatio of full joit distributios Sytax: a set of odes, oe per variable a directed, acyclic graph (lik "directly iflueces") a coditioal distributio for each ode give its parets: P (X i Parets (X i )) I the simplest case, coditioal distributio represeted as a coditioal probability table (CPT) givig the distributio over X i for each combiatio of paret values Bayesia Networks Bayesia belief etwork allows a subset of the variables coditioally idepedet A graphical model of causal relatioships Represets depedecy amog the variables Gives a specificatio of joit probability distributio X Z Y P q Nodes: radom variables q Liks: depedecy q X,Y are the parets of Z, ad Y is the paret of P q No depedecy betwee Z ad P q Has o loops or cycles 16
Coditioal Idepedece Oce we kow that the patiet has cavity we do ot expect the probability of the probe catchig to deped o the presece of toothache catch cavity toothache) = catch cavity) toothache cavity catch) = toothache cavity) Idepedece betwee a ad b a b) = a) b a) = b) Example Topology of etwork ecodes coditioal idepedece assertios: Weather is idepedet of the other variables Toothache ad Catch are coditioally idepedet give Cavity 17
Bayesia Belief Network: A Example Family History Smoker (FH, S) (FH, ~S) (~FH, S) (~FH, ~S) LC 0.8 0.5 0.7 0.1 LugCacer Emphysema ~LC 0.2 0.5 0.3 0.9 PositiveXRay Dyspea Bayesia Belief Networks The coditioal probability table for the variable LugCacer: Shows the coditioal probability for each possible combiatio of its parets Example I'm at work, eighbor Joh calls to say my alarm is rigig, but eighbor Mary does't call. Sometimes it's set off by mior earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohCalls, MaryCalls Network topology reflects "causal" kowledge: A burglar ca set the alarm off A earthquake ca set the alarm off The alarm ca cause Mary to call The alarm ca cause Joh to call 18
Belief Networks Burglary B) 0.001 Earthquake E) 0.002 Alarm Burg. Earth. A) t t.95 t f.94 f t.29 f f.001 JohCalls A J) t.90 f.05 MaryCalls A M) t.7 f.01 Full Joit Distributio x,..., x ) = x parets( X )) 1 i i i= 1 j m a b e) = j a) m a) a b e) b) e) = 0.9 0.7 0.001 0.999 0.998 = 0.00062 19
Compactess A CPT for Boolea X i with k Boolea parets has 2 k rows for the combiatios of paret values Each row requires oe umber p for X i = true (the umber for X i = false is just 1-p) If each variable has o more tha k parets, the complete etwork requires O( 2 k ) umbers I.e., grows liearly with, vs. O(2 ) for the full joit distributio For burglary et, 1 + 1 + 4 + 2 + 2 = 10 umbers (vs. 2 5-1 = 31) Iferece i Bayesia Networks How ca oe ifer the (probabilities of) values of oe or more etwork variables, give observed values of others? Bayes et cotais all iformatio eeded for this iferece If oly oe variable with ukow value, easy to ifer it I geeral case, problem is NP hard 20
Example I the burglary etwork, we migth observe the evet i which JohCalls=true ad MarryCalls=true We could ask for the probability that the burglary has occurred Burglary JohCalls=ture,MarryCalls=true) Remember - Joit distributio cavity toothache) = = cavity toothache) = cavity toothache) toothache) 0.108 + 0.012 0.108 + 0.012 + 0.016 + 0.064 = 0.6 = cavity toothache) toothache) 0.016 + 0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4 21
Normalizatio 1 = y x) + y x) Y X ) = α X Y ) Y ) α y x), y x) α 0.12,0.08 = 0.6,0.4 Normalizatio Cavity toothache) = αcavity, toothache) = α[cavity, toothache,catch) + Cavity, toothache, catch)] = α[< 0.108,0.016 > + < 0.012,0.064 >] = α < 0.12,0.08 >=< 0.6,0.4 > X is the query variable E evidece variable Y remaiig uobservable variable X e) = αx,e) = α X,e, y) Summatio over all possible y (all possible values of the uobservable variables Y) y 22
Burglary JohCalls=ture,MarryCalls=true) The hidde variables of the query are Earthquake ad Alarm B j,m) = αb, j,m) = α e B,e,a, j,m) For Burglary=true i the Bayesai etwork a b j,m) = α b)e)a b,e) j a)m a) e a To compute we had to add four terms, each computed by multiplyig five umbers I the worst case, where we have to sum out almost all variables, the complexity of the etwork with Boolea variables is O(2 ) 23
b) is costat ad ca be moved out, e) term ca be moved outside summatio a b j,m) = αb) e) a b,e) j a)m a) e a JohCalls=true ad MarryCalls=true, the probability that the burglary has occured is aboud 28% B, j,m) = α < 0.00059224,0.0014919 > < 0.284,0.716 > Computatio for Burglary=true 24
Variable elimiatio algorithm Elimiate repeated calculatio Dyamic programmig Irrelevat variables (X query variable, E evidece variables) 25
Complexity of exact iferece The burglary etwork belogs to a family of etworks i which there is at most oe udirected path betwee tow odes i the etwork These are called sigly coected etworks or polytrees The time ad space complexity of exact iferece i polytrees is liear i the size of etwork Size is defied by the umber of CPT etries If the umber of parets of each ode is bouded by a costat, the the complexity will be also liear i the umber of odes For multiply coected etworks variable elimiatio ca have expoetial time ad space complexity 26
Costructig Bayesia Networks A Bayesia etwork is a correct represetatio of the domai oly if each ode is coditioally idepedet of its predecessors i the orderig, give its parets MarryCalls JohCalls,Alarm,Eathquake,Bulgary)=MaryCalls Alarm) Coditioal Idepedece relatios i Bayesia etworks The topological sematics is give either of the specificatios of DESCENDANTS or MARKOV BLANKET 27
Local sematics Example JohCalls is idipedet of Burglary ad Earthquake give the value of Alarm 28
Example Burglary is idipedet of JohCalls ad MaryCalls give Alarm ad Earthquake 29
Costructig Bayesia etworks 1. Choose a orderig of variables X 1,,X 2. For i = 1 to add X i to the etwork select parets from X 1,,X i-1 such that P (X i Parets(X i )) = P (X i X 1,... X i-1 ) This choice of parets guaratees: P (X 1,,X ) = π i =1 P (X i X 1,, X i-1 ) (chai rule) = π i =1 P (X i Parets(X i )) (by costructio) The compactess of Bayesia etworks is a example of locally structured systems Each subcompoet iteracts directly with oly bouded umber of other compoets Costructig Bayesia etworks is difficult Each variable should be directly iflueced by oly a few others The etwork topology reflects thes direct iflueces 30
Example Suppose we choose the orderig M, J, A, B, E J M) = J)? Example Suppose we choose the orderig M, J, A, B, E J M) = J)? No A J, M) = A J)? A J, M) = A)? No B A, J, M) = B A)? B A, J, M) = B)? 31
Example Suppose we choose the orderig M, J, A, B, E J M) = J)? No A J, M) = A J)? A J, M) = A)? No B A, J, M) = B A)? Yes B A, J, M) = B)? No E B, A,J, M) = E A)? E B, A, J, M) = E A, B)? Example Suppose we choose the orderig M, J, A, B, E J M) = J)? No A J, M) = A J)? A J, M) = A)? No B A, J, M) = B A)? Yes B A, J, M) = B)? No E B, A,J, M) = E A)? No E B, A, J, M) = E A, B)? Yes 32
Example cotd. Decidig coditioal idepedece is hard i o causal directios (Causal models ad coditioal idepedece seem hardwired for humas!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 umbers eeded Some liks represet teuous relatioship that require difficult ad uatural probability judgmet, such the probability of Earthquake give Burglary ad Alarm 33