School of EECS Washington State University Artificial Intelligence 1
} Full joint probability distribution Can answer any query But typically too large } Conditional independence Can reduce the number of probabilities needed P(X Y,Z) = P(X Z), if X independent of Y given Z } Bayesian network Concise representation of above Artificial Intelligence 2
} Example Artificial Intelligence 3
} Bayesian network is a directed, acyclic graph } Each node corresponds to a random variable } A directed link from node X to node Y implies that X influences Y X is the parent of Y } Each node X has a conditional probability distribution P(X Parents(X)) Quantifies the influence on X from its parent nodes Conditional probability table (CPT) Artificial Intelligence 4
} Represents full joint distribution } Represents conditional independence E.g., JohnCalls is independent of Burglary and Earthquake given Alarm Artificial Intelligence 5 )) ( ( ),..., ( )) ( ( )... ( 1 1 1 1 1 Õ Õ = = = = = = Ù Ù = n i i i n n i i i i n n X parents x P x x P X parents x X P x X x X P
} P(b, e,a,j,m) = (0.001)(0.998)(0.94)(0.90)(0.70) = 0.000591 Artificial Intelligence 6
} Determine set of random variables {X 1,,X n } } Order them so that causes precede effects } For i = 1 to n do Choose minimal set of parents for X i such that P(X i X i-1,,x 1 ) = P(X i Parents(X i )) For each parent X k insert link from X k to X i Write down the CPT, P(X i Parents(X i )) } E.g., Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Artificial Intelligence 7
} Bad orderings lead to more complex networks with more CPT entries a) MaryCalls, JohnCalls, Alarm, Burglary, Earthquake b) MaryCalls, JohnCalls, Earthquake, Burglary, Alarm Artificial Intelligence 8
} Example: Tooth World Variables: Cavity, Toothache, Catch Cavity P(Cavity) 0.2 Cavity P(Toothache) Cavity P(Catch) t 0.6 Toothache Catch t 0.9 f 0.1 f 0.2 toothache toothache catch catch catch catch cavity.108.012.072.008 cavity.016.064.144.576 Artificial Intelligence 9
} Node X is conditionally independent of its nondescendants (Z ij s) given its parents (U i s) } Markov blanket of node X is X s parents (U i s), children (Y i s) and children s parents (Z ij s) } Node X is conditionally independent of all other nodes in the network given its Markov blanket Artificial Intelligence 10
} Want P(X e) } X is the query variable (can be more than one) } e is an observed event, i.e., values for the evidence variables E = {E 1,,E m } } Any other variables Y are hidden variables } Example P(Burglary JohnCalls=true, MaryCalls=true) =? X = Burglary e = {JohnCalls=true, MaryCalls=true} Y = {Earthquake, Alarm} Artificial Intelligence 11
} Enumerate over all possible values for Y P(X e) = α P(X,e) = α S y P(X,e,y) } Example P(Burglary JohnCalls=true, MaryCalls=true) P(B j,m) = α P(B,j,m) = α S e S a P(B,j,m,e,a) Let b represent (B=true) P(b j,m) = α S e S a P(b)P(e)P(a b,e)p(j a)p(m a) P(b j,m) = α P(b) S e P(e)S a P(a b,e)p(j a)p(m a) Artificial Intelligence 12
} P(b j,m) = α P(b) S e P(e)S a P(a b,e)p(j a)p(m a) } P(B j,m) = α á0.0005922, 0.0014919ñ = á0.284,0.716ñ Artificial Intelligence 13
function ENUMERATION-ASK (X, e, bn) returns a distribution over X inputs: X, the query variable e, observed values of variables E bn, a Bayes net with variables {X} È E È Y Q(X) a distribution over X, initially empty for each value x i of X do Q(x i ) ENUMERATE-ALL(bn.VARS, e xi ) where e xi is e extended with X = x i return NORMALIZE(Q(X)) // Y = hidden variables bn.vars has variables in causeàeffect order function ENUMERATE-ALL (vars, e) returns a real number if EMPTY? (vars) then return 1.0 Y FIRST (vars) if Y has value y in e then return P(y parents(y)) ENUMERATE-ALL(REST(vars), e) else return S y P(y parents(y)) ENUMERATE-ALL(REST(vars), e y ) where e y is e extended with Y = y Artificial Intelligence 14
} ENUMERATION-ASK evaluates trees using depthfirst recursion } Space complexity O(n) } Time complexity O(v n ), where each of n variables has v possible values Artificial Intelligence 15
Note redundant computation Artificial Intelligence 16
} Avoid redundant computation Dynamic programming Store intermediate computations and reuse } Eliminate irrelevant variables Variables that are not an ancestor of a query or evidence variable Artificial Intelligence 17
} General case (any type of network) Worst case space and time complexity is exponential } Polytree is a network with at most one undirected path between any two nodes Space and time complexity is linear in size of network Polytree Not a polytree Artificial Intelligence 18
} Improbability Drive The Hitchhiker s Guide to the Galaxy (2005) Artificial Intelligence 19
} Exact inference can be too expensive } Approximate inference Estimate probabilities from sample, rather than computing exactly } Monte Carlo methods Choose values for hidden variables, compute query variables, repeat and average } Direct sampling } Markov chain sampling Artificial Intelligence 20
} Choose value for variables according to their CPT Consider variables in topological order } E.g., P(B) = á0.001,0.999ñ, B=false P(E) = á0.002,0.998ñ, E=false P(A B=false,E=false) = á0.001,0.999ñ, A=false P(J A=false) = á0.05,0.95ñ, J=false P(M A=false) = á0.01,0.99ñ, M=false Sample is [false,false,false,false,false] samples where X = xi P( X = xi )» samples Artificial Intelligence 21
} New sample generated by random changes to preceding sample Not generated from scratch } Markov Chain Monte Carlo (MCMC) } Gibbs sampling Fix evidence variables to observed values Randomly set non-evidence variables Repeat for desired number of samples Randomly choose a non-evidence variable X Randomly choose new value for X based on CPT Generate sample Artificial Intelligence 22
} Gibbs sampling example } E.g., P(B J=true,M=true) Evidence: J=true, M=true Random initial state: B=true, E=false, A=true P(B) = á0.001,0.999ñ, B=false Sample 1: [false,false,true,true,true] P(E) = á0.002,0.998ñ, E=false Sample 2: [false,false,true,true,true] P(A B=false,E=false) = á0.001,0.999ñ, A=false Sample 3: [false,false,false,true,true] Repeat for more B s, E s and A s (any order) Artificial Intelligence 23
} Sampling techniques converge to correct probabilities given enough samples Artificial Intelligence 24
} Commercial Bayes Server (www.bayesserver.com) BayesiaLab (www.bayesia.com) Netica APIs (www.norsys.com) HUGIN (www.hugin.com) } Free BayesPy (www.bayespy.org) JavaBayes (www.cs.cmu.edu/~javabayes) SMILE (www.bayesfusion.com) } Sample networks www.bnlearn.com/bnrepository Artificial Intelligence 25
} Bayesian networks Captures full joint probability distribution and conditional independence } Exact inference Intractable in worst case } Approximate inference Sampling Converges to exact inference Artificial Intelligence 26