Reasoning with Bayesian Networks

Size: px

Start display at page:

Download "Reasoning with Bayesian Networks"

Basil Lloyd
6 years ago
Views:

1 Reasoning with Lecture 1: Probability Calculus, NICTA and ANU Reasoning with

2 Overview of the Course Probability calculus, Bayesian networks Inference by variable elimination, factor elimination, conditioning Models for graph decomposition Most likely instantiations Complexity of probabilistic inference Compiling Bayesian networks Inference with local structure Applications Reasoning with

3 of Worlds Probability Calculus Worlds modeled with propositional variables A world is complete instantiation of variables of all worlds add up to 1 An event is a set of worlds Reasoning with

4 of Events Probability Calculus Pr(α) def = ω α Pr(ω) Pr(Earthquake) = Pr(ω 1 ) + Pr(ω 2 ) + Pr(ω 3 ) + Pr(ω 4 ) =.1 Reasoning with

5 of Events Pr(α) def = ω α Pr(ω) Pr(Earthquake) = Pr(ω 1 ) + Pr(ω 2 ) + Pr(ω 3 ) + Pr(ω 4 ) =.1 Pr( Burglary) =.8 Pr(Alarm) =.2442 Reasoning with

6 Propositional Sentences as Events Pr((Earthquake Burglary) Alarm) =? Reasoning with

7 Propositional Sentences as Events Pr((Earthquake Burglary) Alarm) =? Interpret world ω as model (satisfying assignment) for propositional sentence α Pr(α) def = ω =α Pr(ω) Reasoning with

8 Probability Calculus Suppose we re given evidence β that Alarm = true Need to update Pr(.) to Pr(. β) Reasoning with

9 Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Reasoning with

10 Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β Reasoning with

11 Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β ω =β Pr(ω) = Pr(β) Reasoning with

12 Probability Calculus Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β ω =β Pr(ω) = Pr(β) ω =β Pr(ω β) = 1 Reasoning with

13 Probability Calculus Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β ω =β Pr(ω) = Pr(β) ω =β Pr(ω β) = 1 Above imply we normalize by constant Pr(β) Reasoning with

14 Probability Calculus Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β ω =β Pr(ω) = Pr(β) ω =β Pr(ω β) = 1 Above imply we normalize by constant Pr(β) Pr(ω β) def = { 0, if ω = β Pr(ω)/ Pr(β), if ω = β Reasoning with

15 Probability Calculus Reasoning with

16 Bayes Conditioning Probability Calculus Pr(α β) = ω =α Pr(ω β) = = ω =α, ω =β ω =α, ω =β = = ω =α β Pr(α β) Pr(β) Pr(ω β) + Pr(ω β) = ω =α, ω = β ω =α β Pr(ω)/ Pr(β) = 1 Pr(β) Pr(ω β) Pr(ω β) ω =α β Pr(ω) Reasoning with

17 Bayes Conditioning Probability Calculus Pr(Burglary) =.2 Reasoning with

18 Bayes Conditioning Probability Calculus Pr(Burglary) =.2 Pr(Burglary Alarm).741 Reasoning with

19 Bayes Conditioning Probability Calculus Pr(Burglary) =.2 Pr(Burglary Alarm).741 Pr(Burglary Alarm Earthquake).253 Reasoning with

20 Bayes Conditioning Probability Calculus Pr(Burglary) =.2 Pr(Burglary Alarm).741 Pr(Burglary Alarm Earthquake).253 Pr(Burglary Alarm Earthquake).957 Reasoning with

21 Probability Calculus Pr(Earthquake) =.1 Pr(Earthquake Burglary) =.1 Reasoning with

22 Probability Calculus Pr(Earthquake) =.1 Pr(Earthquake Burglary) =.1 Event α independent of event β if Pr(α β) = Pr(α) or Pr(β) = 0 Reasoning with

23 Probability Calculus Pr(Earthquake) =.1 Pr(Earthquake Burglary) =.1 Event α independent of event β if Pr(α β) = Pr(α) or Pr(β) = 0 Or equivalently Pr(α β) = Pr(α) Pr(β) Reasoning with

24 Probability Calculus Pr(Earthquake) =.1 Pr(Earthquake Burglary) =.1 Event α independent of event β if Or equivalently is symmetric Pr(α β) = Pr(α) or Pr(β) = 0 Pr(α β) = Pr(α) Pr(β) Reasoning with

25 Conditional is a dynamic notion Burglary independent of Earthquake, but not after accepting evidence Alarm Pr(Burglary Alarm).741 Pr(Burglary Alarm Earthquake).253 Reasoning with

26 Conditional is a dynamic notion Burglary independent of Earthquake, but not after accepting evidence Alarm Pr(Burglary Alarm).741 Pr(Burglary Alarm Earthquake).253 α independent of β given γ if Pr(α β γ) = Pr(α γ) Pr(β γ) or Pr(γ) = 0 Reasoning with

27 Variable Probability Calculus For disjoint sets of variables X, Y, and Z I Pr (X, Z, Y) : X independent of Y given Z in Pr Means x independent of y given z for all instantiations of x, y, z Reasoning with

28 : Chain Rule Pr(α 1 α 2... α n ) = Pr(α 1 α 2... α n ) Pr(α 2 α 3... α n )... Pr(α n ) Reasoning with

29 : Case Analysis Or equivalently Pr(α) = n Pr(α β i ) i=1 Pr(α) = n Pr(α β i ) Pr(β i ) i=1 Where events β i are mutually exclusive and collectively exhaustive Reasoning with

30 : Case Analysis Special case of n = 2 Pr(α) = Pr(α β) + Pr(α β) Or equivalently Pr(α) = Pr(α β) Pr(β) + Pr(α β) Pr( β) Reasoning with

31 : Bayes Rule Pr(α β) = Pr(β α) Pr(α) Pr(β) Reasoning with

32 : Bayes Rule Example: patient tested positive for disease D: patient has disease T : test comes out positive Would like to know Pr(D T ) Reasoning with

33 : Bayes Rule Example: patient tested positive for disease D: patient has disease T : test comes out positive Would like to know Pr(D T ) We know Pr(D) = 1/1000 Pr(T D) = 2/100 (false positive) Pr( T D) = 5/100 (false negative) Reasoning with

34 : Bayes Rule Example: patient tested positive for disease D: patient has disease T : test comes out positive Would like to know Pr(D T ) We know Pr(D) = 1/1000 Pr(T D) = 2/100 (false positive) Pr( T D) = 5/100 (false negative) Using Bayes rule: Pr(D T ) = Pr(T D) Pr(D) Pr(T ) Reasoning with

35 : Bayes Rule Computing Pr(T ): case analysis Pr(T ) = = Pr(T D) Pr(D) + Pr(T D) Pr( D) = Reasoning with

36 : Bayes Rule Computing Pr(T ): case analysis Pr(T ) = = Pr(T D) Pr(D) + Pr(T D) Pr( D) = Pr(D T ) = % Reasoning with

37 Probability Calculus So far we ve considered hard evidence, that some event has occurred Soft evidence: neighbor with hearing problem calls to tell us they heard alarm May not confirm Alarm, but can still increase our belief in Alarm Two main methods to specify strength of soft evidence Reasoning with

38 : All Things Considered After receiving my neighbor s call, my belief in Alarm is now.85 Soft evidence as constraint Pr (β) = q Not a statement about strength of evidence per se, but result of its integration with our initial beliefs Reasoning with

39 : All Things Considered After receiving my neighbor s call, my belief in Alarm is now.85 Soft evidence as constraint Pr (β) = q Not a statement about strength of evidence per se, but result of its integration with our initial beliefs Scale beliefs in worlds = β so they add up to q Scale beliefs in worlds = β so they add up to 1 q Reasoning with

40 : All Things Considered For worlds Pr (ω) def = { q Pr(β) Pr(ω), if ω = β Pr(ω), if ω = β 1 q Pr( β) Reasoning with

41 : All Things Considered For worlds Pr (ω) def = { q Pr(β) Pr(ω), if ω = β Pr(ω), if ω = β 1 q Pr( β) For events (Jeffery s rule) Pr (α) = q Pr(α β) + (1 q) Pr(α β) Reasoning with

42 : All Things Considered For worlds Pr (ω) def = { q Pr(β) Pr(ω), if ω = β Pr(ω), if ω = β 1 q Pr( β) For events (Jeffery s rule) Pr (α) = q Pr(α β) + (1 q) Pr(α β) Bayes conditioning is special case of q = 1 Reasoning with

43 : Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Reasoning with

44 : Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Need notion of odds of event β: O(β) def = Pr(β) Pr( β) Reasoning with

45 : Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Need notion of odds of event β: O(β) def = Pr(β) Pr( β) Declare evidence β by Bayes factor k = O (β) O(β) Reasoning with

46 : Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Need notion of odds of event β: O(β) def = Pr(β) Pr( β) Declare evidence β by Bayes factor k = O (β) O(β) My neighbor s call increases odds of Alarm by factor of 4 Reasoning with

47 : Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Need notion of odds of event β: O(β) def = Pr(β) Pr( β) Declare evidence β by Bayes factor k = O (β) O(β) My neighbor s call increases odds of Alarm by factor of 4 Compute Pr (β) and fall back on Jeffrey s rule to update beliefs Reasoning with

48 Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Probabilistic Reasoning Basic task: belief updating in face of hard and soft evidence Can be done in principle using definitions we ve seen Inefficient, requires joint probability tables (exponential) Also a modeling difficulty Specifying joint probabilities is tedious Beliefs (e.g., independencies) held by human experts not easily enforced Reasoning with

49 Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

50 Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Evidence R could increase Pr(A), Pr(C) Reasoning with

51 Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Evidence R could increase Pr(A), Pr(C) But not if we know A Reasoning with

52 Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Evidence R could increase Pr(A), Pr(C) But not if we know A C independent of R given A Reasoning with

53 Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Parents(V): variables with edge to V Descendants(V): variables to which directed path from V Nondescendants(V): all except V, Parents(V), and Descendants(V) Reasoning with

54 Markovian Assumptions Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Every variable is conditionally independent of its nondescendants given its parents I (V, Parents(V ), Nondescendants(V )) Reasoning with

55 Markovian Assumptions Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Every variable is conditionally independent of its nondescendants given its parents I (V, Parents(V ), Nondescendants(V )) Given direct causes of variable, our beliefs in that variable are no longer influenced by any other variable except possibly by its effects Reasoning with

56 Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I (C, A, {B, E, R}) I (R, E, {A, B, C}) I (A, {B, E}, R) I (B,, {E, R}) I (E,, B) Reasoning with

57 Conditional Probability Tables Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation DAG G together with Markov(G) declares a set of independencies, but does not define Pr Pr is uniquely defined if a conditional probability table (CPT) is provided for each variable X CPT: Pr(x u) for every value x of variable X and every instantiation u of parents U Reasoning with

58 Conditional Probability Tables Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Pr(c a) Pr(r e) Pr(a b, e) Pr(e) Pr(b) Reasoning with

59 Conditional Probability Tables Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

60 Conditional Probability Tables Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Half of probabilities are redundant Only need 10 probabilities for whole DAG Reasoning with

61 Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Bayesian Network A pair (G, Θ) where G is DAG over variables Z, called network structure Θ is set of CPTs, one for each variable in Z, called network parametrization Reasoning with

62 Bayesian Network Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Θ X U : CPT for X and parents U X U: network family θ x u = Pr(x u): network parameter x θ x u = 1 for every parent instantiation u Reasoning with

63 Bayesian Network Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Network instantiation: e.g., a, b, c, d, e θ x u z: instantiations xu & z are compatible θ a, θ b a, θ c a, θ d b,c, θ e c all z above Reasoning with

64 Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Bayesian Network constraints imposed by network structure (DAG) and numeric constraints imposed by network parametrization (CPTs) are satisfied by unique Pr Pr(z) def = θ x u z θ x u (chain rule) Bayesian network is implicit representation of this probability distribution Reasoning with

65 Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Chain Rule Example Pr(z) def = θ x u z θ x u (chain rule) Pr(a, b, c, d, e) = θ a θ b a θ c a θ d b,c θ e c = (.6)(.2)(.2)(.9)(.1) =.0216 Reasoning with

66 Compactness of Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Let d be max variable domain size, k max number of parents Size of any CPT = O(d k+1 ) Total number of network parameters = O(n d k+1 ) for n-variable network Compact as long as k is relatively small, compared with joint probability table (up to d n entries) Reasoning with

67 Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Properties of Probabilistic Distribution Pr represented by Bayesian network (G, Θ) is guaranteed to satisfy every independence assumption in Markov(G) It may satisfy more I Pr (X, Parents(X ), Nondescendants(X )) Derive independence by graphoid axioms Reasoning with

68 Graphoid Axioms: Symmetry Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y) I Pr (Y, Z, X) If learning y does not influence our belief in x, then learning x does not influence our belief in y either Reasoning with

69 Graphoid Axioms: Decomposition Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y W) I Pr (X, Z, Y) I Pr (X, Z, W) If learning yw does not influence our belief in x, then learning y alone, or w alone, does not influence our belief in x either If some information is irrelevant, then any part of it is also irrelevant Reverse does not hold in general: Two pieces of information may each be irrelevant on their own, yet their combination may be relevant Reasoning with

70 Graphoid Axioms: Decomposition Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y W) I Pr (X, Z, Y) I Pr (X, Z, W) In particular, can strengthen Markov(G) I Pr (X, Parents(X ), W) for W Nondescendants(X ) Useful for proving chain rule for Bayesian networks from chain rule of probability calculus Reasoning with

71 Graphoid Axioms: Weak Union Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y W) I Pr (X, Z Y, W) If information yw is not relevant to our belief in x, then partial information y will not make rest of information, w, relevant In particular, can strengthen Markov(G) I Pr (X, Parents(X ) W, Nondescendants(X )\W) for W Nondescendants(X ) Reasoning with

72 Graphoid Axioms: Contraction Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y) I Pr (X, Z Y, W) I Pr (X, Z, Y W) If, after learning irrelevant information y, information w is found to be irrelevant to our belief in x, then combined information yw must have been irrelevant to start with Reasoning with

73 Intersection Axiom Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z W, Y) I Pr (X, Z Y, W) I Pr (X, Z, Y W) for strictly positive distribution Pr If information w is irrelevant given y, and information y is irrelevant given w, then their combination is irrelevant to start with Not true in general, when there re zero probabilities (determinism) Reasoning with

74 Graphical Test of Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Graphoid axioms produce independencies beyond Markov(G) Derivation of independencies is nontrivial; need efficient way Linear-time graphical test : d-separation Reasoning with

75 d-separation Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation dsep G (X, Z, Y): every path between X & Y is blocked by Z dsep G (X, Z, Y) I Pr (X, Z, Y) for every Pr induced by G (soundness) Also enjoys weak completeness (discussed later) Reasoning with

76 d-separation: Paths and Valves Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

77 d-separation: Paths and Valves Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

78 d-separation: Path Blocking Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Path is blocked if any valve is closed Sequential valve W closed iff W Z Divergent valve W closed iff W Z Convergent value W closed iff W Z and none of Descendants(W ) appears in Z Reasoning with

79 d-separation: Examples Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

80 d-separation: Examples Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

81 Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation d-separation: Complexity Need not enumerate paths between X & Y dsep G (X, Z, Y) iff X and Y are disconnected in new DAG G obtained by pruning DAG G as follows Delete all leaves W X Y Z Delete all edges outgoing from Z Linear in size of G Reasoning with

82 d-separation: Soundness Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation dsep G (X, Z, Y) I Pr (X, Z, Y) for Pr induced by G Prove by showing every independence claimed by d-separation can be derived from graphoid axioms Reasoning with

83 Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation d-separation: Completeness? dsep G (X, Z, Y) I Pr (X, Z, Y)? Does not hold Consider X Y Z X not d-separated from Z (given ) However, if θ y x = θ y x then X & Y are independent, so are X & Z Not surprising as d-separation is based on structure alone Reasoning with

84 d-separation: Weak Completeness Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation For every DAG G, parametrization Θ such that dsep G (X, Z, Y) I Pr (X, Z, Y) where Pr is induced by Bayesian network (G, Θ) Implies that one cannot improve on d-separation Reasoning with

85 Summary Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation of worlds & events, Bayes conditioning, independence, chain rule, case analysis, Bayes rule, two methods for soft evidence Capturing independence graphically, Markovian assumptions, conditional probabilities tables, Bayesian networks, chain rule, graphoid axioms, d-separation Reasoning with

Logic and Bayesian Networks

Logic and Bayesian Networks Logic and Part 1: and Jinbo Huang Jinbo Huang and 1/ 31 What This Course Is About Probabilistic reasoning with Bayesian networks Reasoning by logical encoding and compilation Jinbo Huang and 2/ 31 Probabilities