Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE PDF Free Download

ayesian Networks CSE 473 Last Time asic notions tomic events Probabilities Joint distribution Inference by enumeration Independence & conditional independence ayes rule ayesian networks Statistical learning Dynamic ayesian networks (DNs) Markov decision processes (MDPs) Daniel S. Weld 2 xioms of Probability Theory ll probabilities between 0 and 1 0 1 true) 1 false) 0. The probability of disjunction is: + Conditional Probability is the probability of given ssumes that is the only info known. Defined by: Daniel S. Weld 3 Daniel S. Weld 4 Inference by Enumeration Inference by Enumeration toothache).108+.012+.016+.064.20 or 20% Daniel S. Weld 5 Daniel S. Weld 6 1

Problems?? Worst case time: O(n d ) Where d max arity nd n number of random variables Space complexity also O(n d ) Size of joint distribution How get O(n d ) entries for table?? Value of cavity & catch irrelevant - When computing toothache) Daniel S. Weld 7 Independence and are independent iff: P ( P ( P ( P ( These two constraints are logically equivalent Therefore, if and are independent: Daniel S. Weld 8 Independence Independence Complete independence is powerful but rare What to do if it doesn t hold? Daniel S. Weld 9 Daniel S. Weld 10 Conditional Independence & not independent, since < Conditional Independence ut: & are made independent by C C C C), C) C Daniel S. Weld 11 Daniel S. Weld 12 2

Conditional Independence Conditional Independence II catch toothache, cavity) catch cavity) catch toothache, cavity) catch cavity) Why only 5 entries in table? Instead of 7 entries, only need 5 Daniel S. Weld 13 Daniel S. Weld 14 Power of Cond. Independence Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments. ayes Rule E P ( H E) E) Simple proof from def of conditional probability: H E) H E) (Def. cond. prob.) E) H E) E (Def. cond. prob.) P ( H E) E (Mult by H) in line 1) QED: E P ( H E) E) (Substitute #3 in #2) Daniel S. Weld 15 Daniel S. Weld 16 Use to Compute Diagnostic Probability from Causal Probability ayes Rule & Cond. Independence E.g. let M be meningitis, S be stiff neck M) 0.0001, S) 0.1, S M) 0.8 M S) Daniel S. Weld 17 Daniel S. Weld 18 3

ayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference Ns provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update Independence (in the extreme) If X 1, X 2,... X n are mutually independent, then X 1, X 2,... X n ) X 1 )X 2 )... X n ) Joint can be specified with n parameters cf. the usual 2 n -1 parameters required While extreme independence is unusual, Conditional independence is common Ns exploit this conditional independence Daniel S. Weld 19 Daniel S. Weld 20 n Example ayes Net Example (con t) Radio larm urglary Radio larm urglary Pr( E, e,b 0.9 (0.1) e,b 0.2 (0.8) e,b 0.85 (0.15) e,b 0.01 (0.99) Pr(t) Pr(f) 0.05 0.95 If I know if larm, no other evidence influences my degree of belief in N1 N2,,E, N1 also: N2 N2,,E, N2 and E E) y the chain rule we have N1,N2,,E, N1 N2,,E, N2,E, E, E N1 N2,E) E) Full joint requires only 10 parameters (cf. 32) Daniel S. Weld 21 Daniel S. Weld 22 Ns: Qualitative Structure Graphical structure of N reflects conditional independence among variables Each variable X is a node in the DG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence Markov lanket Daniel S. Weld 23 Given Parents, X is Independent of Non-Descendants Daniel S. Weld 24 4

For Example Given Markov lanket, X is Independent of ll Other Nodes urglary Radio larm M(X) Par(X) Childs(X) Par(Childs(X)) Daniel S. Weld 25 Daniel S. Weld 26 Conditional Probability Tables Conditional Probability Tables For complete spec. of joint dist., quantify N Radio larm urglary Pr( E, e,b 0.9 (0.1) e,b 0.2 (0.8) e,b 0.85 (0.15) e,b 0.01 (0.99) Pr(t) Pr(f) 0.05 0.95 For each variable X, specify CPT: X Par(X)) number of params locally exponential in Par(X) If X 1, X 2,... X n is any topological sort of the network, then we are assured: X n,x n-1,...x 1 ) X n X n-1,...x 1 ) X n-1 X n-2, X 1 ) X 2 X 1 ) X 1 ) X n Par(X n )) X n-1 Par(X n-1 )) X 1 ) Daniel S. Weld 27 Daniel S. Weld 28 Inference in Ns The graphical independence representation yields efficient inference schemes We generally want to compute Pr(X), or Pr(X E) where E is (conjunctive) evidence Computations organized by network topology One simple algorithm: variable elimination (VE) Daniel S. Weld 29 Jtrue, Mtrue) Radio John larm urglary Mary b j,m) αb) Σe) Σa b,e)j a)m,a) e a Daniel S. Weld 30 5

Structure of Computation Variable Elimination Dynamic Programming Daniel S. Weld 31 factor is a function from some set of variables into a specific value: e.g., f(e,,n1) CPTs are factors, e.g., E, function of,e, VE works by eliminating all variables in turn until there is a factor with only query variable To eliminate a variable: join all factors containing that variable (like D sum out the influence of the variable on new factor exploits product form of joint distribution Daniel S. Weld 32 Example of VE: N1) N1) Σ N2,,,E N1,N2,,,E) Σ N2,,,E N1 N2,E)E) Σ N1 Σ N2 N2 Σ Σ E,E)E) Earthqk urgl larm Σ N1 Σ N2 N2 Σ f1(, Σ N1 Σ N2 N2 f2( N1 N2 Σ N1 f3( f4(n1) Daniel S. Weld 33 Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor e.g., in example, 3 vars (not 5) linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) Practically, inference is much more tractable using structure of this sort Daniel S. Weld 34 6

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473