Bayesian Networks: Independencies and Inference

Similar documents
Directed Graphical Models. William W. Cohen Machine Learning

Probabilistic Graphical Models: Representation and Inference

Bayesian Networks (Part II)

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Latent Dirichlet Allocation

Directed Graphical Models

Directed Graphical Models (aka. Bayesian Networks)

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Introduction to Artificial Intelligence. Unit # 11

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian networks. Chapter 14, Sections 1 4

Statistical Approaches to Learning and Discovery

Machine Learning Lecture 14

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Bayesian Networks. Motivation

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Learning Bayesian Networks (part 1) Goals for the lecture

Review: Bayesian learning and inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

PROBABILISTIC REASONING SYSTEMS

Probabilistic Graphical Models (I)

Graphical Models - Part I

Bayes Nets: Independence

Informatics 2D Reasoning and Agents Semester 2,

Bayesian Machine Learning

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CMPSCI 240: Reasoning about Uncertainty

Bayesian networks. Chapter Chapter

Rapid Introduction to Machine Learning/ Deep Learning

CS 5522: Artificial Intelligence II

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.

Artificial Intelligence Bayes Nets: Independence

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Directed and Undirected Graphical Models

Bayes Networks 6.872/HST.950

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. C P (D = t) f 0.9 t 0.

Graphical Models and Kernel Methods

Introduction to Bayesian Networks

Probabilistic Reasoning Systems

CS 188: Artificial Intelligence. Bayes Nets

Lecture 6: Graphical Models

Probabilistic and Bayesian Analytics

VC-dimension for characterizing classifiers

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Lecture 8: Bayesian Networks

Bayesian Networks Representation

CS 188: Artificial Intelligence Spring Announcements

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probabilistic Classification

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

CSE 473: Artificial Intelligence Autumn 2011

Probabilistic Models. Models describe how (a portion of) the world works

Artificial Intelligence Bayesian Networks

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Chapter 16. Structured Probabilistic Models for Deep Learning

Cognitive Systems 300: Probability and Causality (cont.)

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Models

Bayesian Networks: Belief Propagation in Singly Connected Networks

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

What Independencies does a Bayes Net Model? Bayesian Networks: Independencies and Inference. Quick proof that independence is symmetric

Y. Xiang, Inference with Uncertain Knowledge 1

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Another look at Bayesian. inference

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

{ p if x = 1 1 p if x = 0

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Machine Learning for Data Science (CS4786) Lecture 19

Introduction to Artificial Intelligence Belief networks

Sums of Products. Pasi Rastas November 15, 2005

Directed Graphical Models

Reasoning Under Uncertainty: Belief Network Inference

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Probability. CS 3793/5233 Artificial Intelligence Probability 1

CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Our learning goals for Bayesian Nets

Structure Learning: the good, the bad, the ugly

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Based on slides by Richard Zemel

Directed Graphical Models or Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

CS281A/Stat241A Lecture 19

Uncertain Reasoning. Environment Description. Configurations. Models. Bayesian Networks

Variable Elimination (VE) Barak Sternberg

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

STA 4273H: Statistical Machine Learning

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian Networks. Material used

Transcription:

Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew s tutorials: http://www.cs.cmu.edu/~awm/tutorials. Comments and corrections gratefully received.

What Independencies does a Bayes Net Model? In order for a Bayesian network to model a probability distribution, the following must be true by definition: Each variable is conditionally independent of all its nondescendants in the graph given the value of all its parents. This implies But what else does it imply?

What Independencies does a Bayes Net Model? Example: Z Given Y, does learning the value of Z tell us nothing new about X? I.e., is P(X Y, Z) equal to P(X Y)? Y X Yes. Since we know the value of all of X s parents (namely, Y), and Z is not a descendant of X, X is conditionally independent of Z. Also, since independence is symmetric, P(Z Y, X) = P(Z Y).

What Independencies does a Bayes Net Model? Let I<X,Y,Z> represent X and Z being conditionally independent given Y. Y X Z I<X,Y,Z>? Yes, just as in previous example: All X s parents given, and Z is not a descendant.

What Independencies does a Bayes Net Model? Z U V I<X,{U},Z>? No. I<X,{U,V},Z>? Yes. X Maybe I<X, S, Z> iff S acts a cutset between X and Z in an undirected version of the graph?

Things get a little more confusing X Z Y X has no parents, so we know all its parents values trivially Z is not a descendant of X So, I<X,{},Z>, even though there s a undirected path from X to Z through an unknown variable Y. What if we do know the value of Y, though? Or one of its descendants?

The Burglar Alarm example Burglar Earthquake Alarm Phone Call Your house has a twitchy burglar alarm that is also sometimes triggered by earthquakes. Earth arguably doesn t care whether your house is currently being burgled While you are on vacation, one of your neighbors calls and tells you your home s burglar alarm is ringing. Uh oh!

Things get a lot more confusing Burglar Earthquake Alarm Phone Call But now suppose you learn that there was a medium-sized earthquake in your neighborhood. Oh, whew! Probably not a burglar after all. Earthquake explains away the hypothetical burglar. But then it must not be the case that I<Burglar,{Phone Call}, Earthquake>, even though I<Burglar,{}, Earthquake>!

d-separation to the rescue Fortunately, there is a relatively simple algorithm for determining whether two variables in a Bayesian network are conditionally independent: d-separation. Definition: X and Z are d-separated by a set of evidence variables E iff every undirected path from X to Z is blocked, where a path is blocked iff one or more of the following conditions is true:...

A path is blocked when... There exists a variable V on the path such that it is in the evidence set E the arcs putting V in the path are tail-to-tail Or, there exists a variable V on the path such that it is in the evidence set E the arcs putting V in the path are tail-to-head Or,... V V

A path is blocked when (the funky case) Or, there exists a variable V on the path such that it is NOT in the evidence set E neither are any of its descendants the arcs putting V on the path are head-to-head V

d-separation to the rescue, cont d Theorem [Verma & Pearl, 1998]: If a set of evidence variables E d-separates X and Z in a Bayesian network s graph, then I<X, E, Z>. d-separation can be computed in linear time using a depth-first-search-like algorithm. Great! We now have a fast algorithm for automatically inferring whether learning the value of one variable might give us any additional hints about some other variable, given what we already know. Might : Variables may actually be independent when they re not d- separated, depending on the actual probabilities involved

d-separation example A C E G B D F H I<C, {}, D>? I<C, {A}, D>? I<C, {A, B}, D>? I<C, {A, B, J}, D>? I<C, {A, B, E, J}, D>? I J

Bayesian Network Inference Inference: calculating P(X Y) for some variables or sets of variables X and Y. Inference in Bayesian networks is #P-hard! Inputs: prior probabilities of.5 Reduces to I1 I2 I3 I4 I5 How many satisfying assignments? O P(O) must be (#sat. assign.)*(.5^#inputs)

Bayesian Network Inference But inference is still tractable in some cases. Let s look a special class of networks: trees / forests in which each node has at most one parent.

Decomposing the probabilities Suppose we want P(X i E) where E is some set of evidence variables. Let s split E into two parts: E i - is the part consisting of assignments to variables in the subtree rooted at X i E i + is the rest of it X i

Decomposing the probabilities, cont d X i

Decomposing the probabilities, cont d X i

Decomposing the probabilities, cont d X i

Decomposing the probabilities, cont d X i Where: α is a constant independent of X i π(x i ) = P(X i E i+ ) λ(x i ) = P(E i- X i )

Using the decomposition for inference We can use this decomposition to do inference as follows. First, compute λ(x i ) = P(E i- X i ) for all X i recursively, using the leaves of the tree as the base case. If X i is a leaf: If X i is in E: λ(x i ) = 1 if X i matches E, 0 otherwise If X i is not in E: E i - is the null set, so P(E i- X i ) = 1 (constant)

Quick aside: Virtual evidence For theoretical simplicity, but without loss of generality, let s assume that all variables in E (the evidence set) are leaves in the tree. Why can we do this WLOG: X i Equivalent to X i Observe X i X i Observe X i Where P(X i X i ) =1 if X i =X i, 0 otherwise

Calculating λ(x i ) for non-leaves Suppose X i has one child, X c. X i Then: X c

Calculating λ(x i ) for non-leaves Suppose X i has one child, X c. X i Then: X c

Calculating λ(x i ) for non-leaves Suppose X i has one child, X c. X i Then: X c

Calculating λ(x i ) for non-leaves Suppose X i has one child, X c. X i Then: X c

Calculating λ(x i ) for non-leaves Now, suppose X i has a set of children, C. Since X i d-separates each of its subtrees, the contribution of each subtree to λ(x i ) is independent: where λ j (X i ) is the contribution to P(E i- X i ) of the part of the evidence lying in the subtree rooted at one of X i s children X j.

We are now λ-happy So now we have a way to recursively compute all the λ(x i ) s, starting from the root and using the leaves as the base case. If we want, we can think of each node in the network as an autonomous processor that passes a little λ message to its parent. λ λ λ λ λ λ

The other half of the problem Remember, P(X i E) = απ(x i )λ(x i ). Now that we have all the λ(x i ) s, what about the π(x i ) s? π(x i ) = P(X i E i+ ). What about the root of the tree, X r? In that case, E r + is the null set, so π(x r ) = P(X r ). No sweat. So for an arbitrary X i with parent X p, let s inductively assume we know π(x p ) and/or P(X p E). How do we get π(x i )?

Computing π(x i ) X p X i

Computing π(x i ) X p X i

Computing π(x i ) X p X i

Computing π(x i ) X p X i

Computing π(x i ) X p X i

Computing π(x i ) X p X i Where π i (X p ) is defined as

We re done. Yay! Thus we can compute all the π(x i ) s, and, in turn, all the P(X i E) s. Can think of nodes as autonomous processors passing λ and π messages to their neighbors λ π π λ λ λ λ λ π π π π

Conjunctive queries What if we want, e.g., P(A, B C) instead of just marginal distributions P(A C) and P(B C)? Just use chain rule: P(A, B C) = P(A C) P(B A, C) Each of the latter probabilities can be computed using the technique just discussed.