Reasoning with Bayesian Networks

Similar documents
Logic and Bayesian Networks

Probability Calculus. Chapter From Propositional to Graded Beliefs

Bayesian Networks. Probability Theory and Probabilistic Reasoning. Emma Rollon and Javier Larrosa Q

Refresher on Probability Theory

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Probability Calculus. p.1

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Networks: Representation, Variable Elimination

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Uncertainty and Bayesian Networks

Quantifying uncertainty & Bayesian networks

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

PROBABILISTIC REASONING SYSTEMS

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Probabilistic Reasoning Systems

Bayesian Networks. Material used

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian networks. Chapter 14, Sections 1 4

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian networks. Chapter Chapter

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Bayesian belief networks

Bayesian Networks. Motivation

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Inference in Bayesian Networks

A Tutorial on Bayesian Belief Networks

Probabilistic Representation and Reasoning

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Directed Graphical Models

Probabilistic representation and reasoning

Bayesian Belief Network

CS 188: Artificial Intelligence Spring Announcements

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Fusion in simple models

Our learning goals for Bayesian Nets

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Introduction to Artificial Intelligence. Unit # 11

Directed and Undirected Graphical Models

Learning Bayesian Networks (part 1) Goals for the lecture

Y. Xiang, Inference with Uncertain Knowledge 1

Probabilistic representation and reasoning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Causality in Econometrics (3)

COMP5211 Lecture Note on Reasoning under Uncertainty

Bayesian Networks and Decision Graphs

Learning Bayesian Network Parameters under Equivalence Constraints

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Propositional Logic and Probability Theory: Review

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Bayesian Networks (Part II)

Lecture 8: December 17, 2003

Review: Bayesian learning and inference

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Capturing Independence Graphically; Undirected Graphs

CS 188: Artificial Intelligence. Bayes Nets

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II

CSE 473: Artificial Intelligence Autumn 2011

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Bayesian Networks Representation

Bayesian belief networks. Inference.

Formalizing Probability. Choosing the Sample Space. Probability Measures

Probabilistic Graphical Networks: Definitions and Basic Results

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Informatics 2D Reasoning and Agents Semester 2,

Lecture 8: Bayesian Networks

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Introduction to Artificial Intelligence Belief networks

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Markov properties for undirected graphs

CS 188: Artificial Intelligence Fall 2008

CS Lecture 3. More Bayesian Networks

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian Approaches Data Mining Selected Technique

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

Modeling and reasoning with uncertainty

Lecture 5: Bayesian Network

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Transcription:

Reasoning with Lecture 1: Probability Calculus, NICTA and ANU Reasoning with

Overview of the Course Probability calculus, Bayesian networks Inference by variable elimination, factor elimination, conditioning Models for graph decomposition Most likely instantiations Complexity of probabilistic inference Compiling Bayesian networks Inference with local structure Applications Reasoning with

of Worlds Probability Calculus Worlds modeled with propositional variables A world is complete instantiation of variables of all worlds add up to 1 An event is a set of worlds Reasoning with

of Events Probability Calculus Pr(α) def = ω α Pr(ω) Pr(Earthquake) = Pr(ω 1 ) + Pr(ω 2 ) + Pr(ω 3 ) + Pr(ω 4 ) =.1 Reasoning with

of Events Pr(α) def = ω α Pr(ω) Pr(Earthquake) = Pr(ω 1 ) + Pr(ω 2 ) + Pr(ω 3 ) + Pr(ω 4 ) =.1 Pr( Burglary) =.8 Pr(Alarm) =.2442 Reasoning with

Propositional Sentences as Events Pr((Earthquake Burglary) Alarm) =? Reasoning with

Propositional Sentences as Events Pr((Earthquake Burglary) Alarm) =? Interpret world ω as model (satisfying assignment) for propositional sentence α Pr(α) def = ω =α Pr(ω) Reasoning with

Probability Calculus Suppose we re given evidence β that Alarm = true Need to update Pr(.) to Pr(. β) Reasoning with

Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Reasoning with

Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β Reasoning with

Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β ω =β Pr(ω) = Pr(β) Reasoning with

Probability Calculus Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β ω =β Pr(ω) = Pr(β) ω =β Pr(ω β) = 1 Reasoning with

Probability Calculus Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β ω =β Pr(ω) = Pr(β) ω =β Pr(ω β) = 1 Above imply we normalize by constant Pr(β) Reasoning with

Probability Calculus Pr(β β) = 1, Pr( β β) = 0 Partition worlds into those = β and those = β Pr(ω β) = 0, for all ω = β Relative beliefs stay put, for ω = β ω =β Pr(ω) = Pr(β) ω =β Pr(ω β) = 1 Above imply we normalize by constant Pr(β) Pr(ω β) def = { 0, if ω = β Pr(ω)/ Pr(β), if ω = β Reasoning with

Probability Calculus Reasoning with

Bayes Conditioning Probability Calculus Pr(α β) = ω =α Pr(ω β) = = ω =α, ω =β ω =α, ω =β = = ω =α β Pr(α β) Pr(β) Pr(ω β) + Pr(ω β) = ω =α, ω = β ω =α β Pr(ω)/ Pr(β) = 1 Pr(β) Pr(ω β) Pr(ω β) ω =α β Pr(ω) Reasoning with

Bayes Conditioning Probability Calculus Pr(Burglary) =.2 Reasoning with

Bayes Conditioning Probability Calculus Pr(Burglary) =.2 Pr(Burglary Alarm).741 Reasoning with

Bayes Conditioning Probability Calculus Pr(Burglary) =.2 Pr(Burglary Alarm).741 Pr(Burglary Alarm Earthquake).253 Reasoning with

Bayes Conditioning Probability Calculus Pr(Burglary) =.2 Pr(Burglary Alarm).741 Pr(Burglary Alarm Earthquake).253 Pr(Burglary Alarm Earthquake).957 Reasoning with

Probability Calculus Pr(Earthquake) =.1 Pr(Earthquake Burglary) =.1 Reasoning with

Probability Calculus Pr(Earthquake) =.1 Pr(Earthquake Burglary) =.1 Event α independent of event β if Pr(α β) = Pr(α) or Pr(β) = 0 Reasoning with

Probability Calculus Pr(Earthquake) =.1 Pr(Earthquake Burglary) =.1 Event α independent of event β if Pr(α β) = Pr(α) or Pr(β) = 0 Or equivalently Pr(α β) = Pr(α) Pr(β) Reasoning with

Probability Calculus Pr(Earthquake) =.1 Pr(Earthquake Burglary) =.1 Event α independent of event β if Or equivalently is symmetric Pr(α β) = Pr(α) or Pr(β) = 0 Pr(α β) = Pr(α) Pr(β) Reasoning with

Conditional is a dynamic notion Burglary independent of Earthquake, but not after accepting evidence Alarm Pr(Burglary Alarm).741 Pr(Burglary Alarm Earthquake).253 Reasoning with

Conditional is a dynamic notion Burglary independent of Earthquake, but not after accepting evidence Alarm Pr(Burglary Alarm).741 Pr(Burglary Alarm Earthquake).253 α independent of β given γ if Pr(α β γ) = Pr(α γ) Pr(β γ) or Pr(γ) = 0 Reasoning with

Variable Probability Calculus For disjoint sets of variables X, Y, and Z I Pr (X, Z, Y) : X independent of Y given Z in Pr Means x independent of y given z for all instantiations of x, y, z Reasoning with

: Chain Rule Pr(α 1 α 2... α n ) = Pr(α 1 α 2... α n ) Pr(α 2 α 3... α n )... Pr(α n ) Reasoning with

: Case Analysis Or equivalently Pr(α) = n Pr(α β i ) i=1 Pr(α) = n Pr(α β i ) Pr(β i ) i=1 Where events β i are mutually exclusive and collectively exhaustive Reasoning with

: Case Analysis Special case of n = 2 Pr(α) = Pr(α β) + Pr(α β) Or equivalently Pr(α) = Pr(α β) Pr(β) + Pr(α β) Pr( β) Reasoning with

: Bayes Rule Pr(α β) = Pr(β α) Pr(α) Pr(β) Reasoning with

: Bayes Rule Example: patient tested positive for disease D: patient has disease T : test comes out positive Would like to know Pr(D T ) Reasoning with

: Bayes Rule Example: patient tested positive for disease D: patient has disease T : test comes out positive Would like to know Pr(D T ) We know Pr(D) = 1/1000 Pr(T D) = 2/100 (false positive) Pr( T D) = 5/100 (false negative) Reasoning with

: Bayes Rule Example: patient tested positive for disease D: patient has disease T : test comes out positive Would like to know Pr(D T ) We know Pr(D) = 1/1000 Pr(T D) = 2/100 (false positive) Pr( T D) = 5/100 (false negative) Using Bayes rule: Pr(D T ) = Pr(T D) Pr(D) Pr(T ) Reasoning with

: Bayes Rule Computing Pr(T ): case analysis Pr(T ) = = Pr(T D) Pr(D) + Pr(T D) Pr( D) 95 100 1 1000 + 2 200 999 1000 = 2093 100000 Reasoning with

: Bayes Rule Computing Pr(T ): case analysis Pr(T ) = = Pr(T D) Pr(D) + Pr(T D) Pr( D) 95 100 1 1000 + 2 200 999 1000 = 2093 100000 Pr(D T ) = 95 2093 4.5% Reasoning with

Probability Calculus So far we ve considered hard evidence, that some event has occurred Soft evidence: neighbor with hearing problem calls to tell us they heard alarm May not confirm Alarm, but can still increase our belief in Alarm Two main methods to specify strength of soft evidence Reasoning with

: All Things Considered After receiving my neighbor s call, my belief in Alarm is now.85 Soft evidence as constraint Pr (β) = q Not a statement about strength of evidence per se, but result of its integration with our initial beliefs Reasoning with

: All Things Considered After receiving my neighbor s call, my belief in Alarm is now.85 Soft evidence as constraint Pr (β) = q Not a statement about strength of evidence per se, but result of its integration with our initial beliefs Scale beliefs in worlds = β so they add up to q Scale beliefs in worlds = β so they add up to 1 q Reasoning with

: All Things Considered For worlds Pr (ω) def = { q Pr(β) Pr(ω), if ω = β Pr(ω), if ω = β 1 q Pr( β) Reasoning with

: All Things Considered For worlds Pr (ω) def = { q Pr(β) Pr(ω), if ω = β Pr(ω), if ω = β 1 q Pr( β) For events (Jeffery s rule) Pr (α) = q Pr(α β) + (1 q) Pr(α β) Reasoning with

: All Things Considered For worlds Pr (ω) def = { q Pr(β) Pr(ω), if ω = β Pr(ω), if ω = β 1 q Pr( β) For events (Jeffery s rule) Pr (α) = q Pr(α β) + (1 q) Pr(α β) Bayes conditioning is special case of q = 1 Reasoning with

: Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Reasoning with

: Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Need notion of odds of event β: O(β) def = Pr(β) Pr( β) Reasoning with

: Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Need notion of odds of event β: O(β) def = Pr(β) Pr( β) Declare evidence β by Bayes factor k = O (β) O(β) Reasoning with

: Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Need notion of odds of event β: O(β) def = Pr(β) Pr( β) Declare evidence β by Bayes factor k = O (β) O(β) My neighbor s call increases odds of Alarm by factor of 4 Reasoning with

: Nothing Else Considered Would like to declare strength of evidence independently of current beliefs Need notion of odds of event β: O(β) def = Pr(β) Pr( β) Declare evidence β by Bayes factor k = O (β) O(β) My neighbor s call increases odds of Alarm by factor of 4 Compute Pr (β) and fall back on Jeffrey s rule to update beliefs Reasoning with

Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Probabilistic Reasoning Basic task: belief updating in face of hard and soft evidence Can be done in principle using definitions we ve seen Inefficient, requires joint probability tables (exponential) Also a modeling difficulty Specifying joint probabilities is tedious Beliefs (e.g., independencies) held by human experts not easily enforced Reasoning with

Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Evidence R could increase Pr(A), Pr(C) Reasoning with

Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Evidence R could increase Pr(A), Pr(C) But not if we know A Reasoning with

Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Evidence R could increase Pr(A), Pr(C) But not if we know A C independent of R given A Reasoning with

Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Parents(V): variables with edge to V Descendants(V): variables to which directed path from V Nondescendants(V): all except V, Parents(V), and Descendants(V) Reasoning with

Markovian Assumptions Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Every variable is conditionally independent of its nondescendants given its parents I (V, Parents(V ), Nondescendants(V )) Reasoning with

Markovian Assumptions Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Every variable is conditionally independent of its nondescendants given its parents I (V, Parents(V ), Nondescendants(V )) Given direct causes of variable, our beliefs in that variable are no longer influenced by any other variable except possibly by its effects Reasoning with

Capturing Graphically Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I (C, A, {B, E, R}) I (R, E, {A, B, C}) I (A, {B, E}, R) I (B,, {E, R}) I (E,, B) Reasoning with

Conditional Probability Tables Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation DAG G together with Markov(G) declares a set of independencies, but does not define Pr Pr is uniquely defined if a conditional probability table (CPT) is provided for each variable X CPT: Pr(x u) for every value x of variable X and every instantiation u of parents U Reasoning with

Conditional Probability Tables Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Pr(c a) Pr(r e) Pr(a b, e) Pr(e) Pr(b) Reasoning with

Conditional Probability Tables Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

Conditional Probability Tables Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Half of probabilities are redundant Only need 10 probabilities for whole DAG Reasoning with

Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Bayesian Network A pair (G, Θ) where G is DAG over variables Z, called network structure Θ is set of CPTs, one for each variable in Z, called network parametrization Reasoning with

Bayesian Network Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Θ X U : CPT for X and parents U X U: network family θ x u = Pr(x u): network parameter x θ x u = 1 for every parent instantiation u Reasoning with

Bayesian Network Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Network instantiation: e.g., a, b, c, d, e θ x u z: instantiations xu & z are compatible θ a, θ b a, θ c a, θ d b,c, θ e c all z above Reasoning with

Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Bayesian Network constraints imposed by network structure (DAG) and numeric constraints imposed by network parametrization (CPTs) are satisfied by unique Pr Pr(z) def = θ x u z θ x u (chain rule) Bayesian network is implicit representation of this probability distribution Reasoning with

Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Chain Rule Example Pr(z) def = θ x u z θ x u (chain rule) Pr(a, b, c, d, e) = θ a θ b a θ c a θ d b,c θ e c = (.6)(.2)(.2)(.9)(.1) =.0216 Reasoning with

Compactness of Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Let d be max variable domain size, k max number of parents Size of any CPT = O(d k+1 ) Total number of network parameters = O(n d k+1 ) for n-variable network Compact as long as k is relatively small, compared with joint probability table (up to d n entries) Reasoning with

Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Properties of Probabilistic Distribution Pr represented by Bayesian network (G, Θ) is guaranteed to satisfy every independence assumption in Markov(G) It may satisfy more I Pr (X, Parents(X ), Nondescendants(X )) Derive independence by graphoid axioms Reasoning with

Graphoid Axioms: Symmetry Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y) I Pr (Y, Z, X) If learning y does not influence our belief in x, then learning x does not influence our belief in y either Reasoning with

Graphoid Axioms: Decomposition Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y W) I Pr (X, Z, Y) I Pr (X, Z, W) If learning yw does not influence our belief in x, then learning y alone, or w alone, does not influence our belief in x either If some information is irrelevant, then any part of it is also irrelevant Reverse does not hold in general: Two pieces of information may each be irrelevant on their own, yet their combination may be relevant Reasoning with

Graphoid Axioms: Decomposition Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y W) I Pr (X, Z, Y) I Pr (X, Z, W) In particular, can strengthen Markov(G) I Pr (X, Parents(X ), W) for W Nondescendants(X ) Useful for proving chain rule for Bayesian networks from chain rule of probability calculus Reasoning with

Graphoid Axioms: Weak Union Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y W) I Pr (X, Z Y, W) If information yw is not relevant to our belief in x, then partial information y will not make rest of information, w, relevant In particular, can strengthen Markov(G) I Pr (X, Parents(X ) W, Nondescendants(X )\W) for W Nondescendants(X ) Reasoning with

Graphoid Axioms: Contraction Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z, Y) I Pr (X, Z Y, W) I Pr (X, Z, Y W) If, after learning irrelevant information y, information w is found to be irrelevant to our belief in x, then combined information yw must have been irrelevant to start with Reasoning with

Intersection Axiom Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation I Pr (X, Z W, Y) I Pr (X, Z Y, W) I Pr (X, Z, Y W) for strictly positive distribution Pr If information w is irrelevant given y, and information y is irrelevant given w, then their combination is irrelevant to start with Not true in general, when there re zero probabilities (determinism) Reasoning with

Graphical Test of Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Graphoid axioms produce independencies beyond Markov(G) Derivation of independencies is nontrivial; need efficient way Linear-time graphical test : d-separation Reasoning with

d-separation Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation dsep G (X, Z, Y): every path between X & Y is blocked by Z dsep G (X, Z, Y) I Pr (X, Z, Y) for every Pr induced by G (soundness) Also enjoys weak completeness (discussed later) Reasoning with

d-separation: Paths and Valves Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

d-separation: Paths and Valves Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

d-separation: Path Blocking Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Path is blocked if any valve is closed Sequential valve W closed iff W Z Divergent valve W closed iff W Z Convergent value W closed iff W Z and none of Descendants(W ) appears in Z Reasoning with

d-separation: Examples Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

d-separation: Examples Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation Reasoning with

Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation d-separation: Complexity Need not enumerate paths between X & Y dsep G (X, Z, Y) iff X and Y are disconnected in new DAG G obtained by pruning DAG G as follows Delete all leaves W X Y Z Delete all edges outgoing from Z Linear in size of G Reasoning with

d-separation: Soundness Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation dsep G (X, Z, Y) I Pr (X, Z, Y) for Pr induced by G Prove by showing every independence claimed by d-separation can be derived from graphoid axioms Reasoning with

Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation d-separation: Completeness? dsep G (X, Z, Y) I Pr (X, Z, Y)? Does not hold Consider X Y Z X not d-separated from Z (given ) However, if θ y x = θ y x then X & Y are independent, so are X & Z Not surprising as d-separation is based on structure alone Reasoning with

d-separation: Weak Completeness Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation For every DAG G, parametrization Θ such that dsep G (X, Z, Y) I Pr (X, Z, Y) where Pr is induced by Bayesian network (G, Θ) Implies that one cannot improve on d-separation Reasoning with

Summary Probability Calculus Capturing Graphically Conditional Probability Tables Graphoid Axioms d-separation of worlds & events, Bayes conditioning, independence, chain rule, case analysis, Bayes rule, two methods for soft evidence Capturing independence graphically, Markovian assumptions, conditional probabilities tables, Bayesian networks, chain rule, graphoid axioms, d-separation Reasoning with