Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Similar documents
Reasoning with Bayesian Networks

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Directed Graphical Models

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Introduction to Bayesian Networks

Refresher on Probability Theory

Logic and Bayesian Networks

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Probabilistic Graphical Models (I)

Probabilistic Models. Models describe how (a portion of) the world works

Directed Graphical Models or Bayesian Networks

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

CS Lecture 3. More Bayesian Networks

Bayesian Networks Representation

Probabilistic Graphical Networks: Definitions and Basic Results

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Bayesian Networks BY: MOHAMAD ALSABBAGH

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian Networks: Representation, Variable Elimination

Chris Bishop s PRML Ch. 8: Graphical Models

Naïve Bayes Classifiers

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Bayesian networks. Chapter 14, Sections 1 4

2 : Directed GMs: Bayesian Networks

Markov properties for undirected graphs

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

CSE 473: Artificial Intelligence Autumn 2011

Capturing Independence Graphically; Undirected Graphs

Rapid Introduction to Machine Learning/ Deep Learning

Lecture 4 October 18th

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Which coin will I use? Which coin will I use? Logistics. Statistical Learning. Topics. 573 Topics. Coin Flip. Coin Flip

Review: Bayesian learning and inference

Introduction to Bayesian Learning

Artificial Intelligence Bayes Nets: Independence

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Probabilistic Models

Lecture 8: Bayesian Networks

Informatics 2D Reasoning and Agents Semester 2,

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Inference in Bayesian Networks

Bayesian Belief Network

Markov properties for directed graphs

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Based on slides by Richard Zemel

CS 188: Artificial Intelligence Fall 2009

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

CSC 412 (Lecture 4): Undirected Graphical Models

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Bayesian networks. Instructor: Vincent Conitzer

Undirected Graphical Models: Markov Random Fields

CPSC 540: Machine Learning

Capturing Independence Graphically; Undirected Graphs

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Directed and Undirected Graphical Models

Bayesian Networks. Motivation

STA 4273H: Statistical Machine Learning

Cognitive Systems 300: Probability and Causality (cont.)

Learning in Bayesian Networks

Bayesian Networks Basic and simple graphs

Machine Learning Lecture 14

CS 5522: Artificial Intelligence II

Bayesian networks. Chapter Chapter

Introduction to Probabilistic Graphical Models

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Probabilistic Graphical Models

2 : Directed GMs: Bayesian Networks

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

A Tutorial on Bayesian Belief Networks

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Conditional Independence

Lecture 5: Bayesian Network

Introduction to Bayesian Networks

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Graphical Models

CS Lecture 4. Markov Random Fields

Bayes Nets: Independence

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

PROBABILISTIC REASONING SYSTEMS

Y. Xiang, Inference with Uncertain Knowledge 1

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayes Networks 6.872/HST.950

Quantifying uncertainty & Bayesian networks

CS 343: Artificial Intelligence

The Monte Carlo Method: Bayesian Networks

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Part I Qualitative Probabilistic Networks

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

Probabilistic Classification

Transcription:

Preliminaries Graphoid Axioms d-separation Wrap-up Much of this material is adapted from Chapter 4 of Darwiche s book January 23, 2014

Preliminaries Graphoid Axioms d-separation Wrap-up 1 Preliminaries 2 3 Graphoid Axioms 4 d-separation 5 Wrap-up

Preliminaries Graphoid Axioms d-separation Wrap-up Graph concepts and terminology We have a directed acyclic graph in which the set of nodes represent random variables, X. Winter? (A) Sprinkler? (B) Rain? (C) Wet Grass? (D) Slippery Road? (E) Pa X : the parents of variable/node X Desc X : the descendents of X NonDesc X : the non-descendents of X, X \ {X } \ Pa X \ Desc X

Preliminaries Graphoid Axioms d-separation Wrap-up Graph concepts and terminology We have a directed acyclic graph in which the set of nodes represent random variables, X. Winter? (A) Sprinkler? (B) Rain? (C) Wet Grass? (D) Slippery Road? (E) Trail or pipe. Any sequence of edges which connects two variables Example: Sprinkler Wet Grass Rain Slippery Road N.B. The direction of the edge is not considered. Valve. A variable in a trail

Preliminaries Graphoid Axioms d-separation Wrap-up Probability terminology and notation We have a conditional probability distribution represented as a table, called a conditional probability table. A B Θ B A T T 0.20 T F 0.80 F T 0.75 F F 0.25 Family. The variable X and its parents Pa X, B and {A} here Parameters. The conditional probability distributions, Pr(X = x Pa X = pa), often denoted θ x pa Each instantiation of Pa X gives a different conditional distribution for X, so x θ x pa = 1 for each pa.

Preliminaries Graphoid Axioms d-separation Wrap-up Probability terminology and notation We have a conditional probability distribution represented as a table, called a conditional probability table. A B Θ B A T T 0.20 T F 0.80 F T 0.75 F F 0.25 Compatability. A parameter θ x pa is compatible with a (partial) instantiation z if they assign the same value to common variables. We use θ x pa z to indicate compatibility. Conditional independence. I (X, Z, Y) means that X is independent of Y given Z.

Preliminaries Graphoid Axioms d-separation Wrap-up Factorized distributions How can we use chain rule to write Pr(A, B, C, D, E)?

Preliminaries Graphoid Axioms d-separation Wrap-up Factorized distributions How can we use chain rule to write Pr(A, B, C, D, E)? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A, B)Pr(D A, B, C)Pr(E A, B, C, D) How many parameters does this require?

Preliminaries Graphoid Axioms d-separation Wrap-up Factorized distributions How can we use chain rule to write Pr(A, B, C, D, E)? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A, B)Pr(D A, B, C)Pr(E A, B, C, D) How many parameters does this require? What if I (E, {C}, {A, B, D})?

Preliminaries Graphoid Axioms d-separation Wrap-up Factorized distributions How can we use chain rule to write Pr(A, B, C, D, E)? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A, B)Pr(D A, B, C)Pr(E A, B, C, D) How many parameters does this require? What if I (E, {C}, {A, B, D})? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A, B)Pr(D A, B, C)Pr(E C) How many parameters does this require?

Preliminaries Graphoid Axioms d-separation Wrap-up Factorized distributions How can we use chain rule to write Pr(A, B, C, D, E)? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A, B)Pr(D A, B, C)Pr(E A, B, C, D) How many parameters does this require? What if I (E, {C}, {A, B, D})? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A, B)Pr(D A, B, C)Pr(E C) How many parameters does this require? What if (additionally) I (C, {A}, {B}) and I (D, {B, C}, {A})?

Preliminaries Graphoid Axioms d-separation Wrap-up Factorized distributions How can we use chain rule to write Pr(A, B, C, D, E)? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A, B)Pr(D A, B, C)Pr(E A, B, C, D) How many parameters does this require? What if I (E, {C}, {A, B, D})? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A, B)Pr(D A, B, C)Pr(E C) How many parameters does this require? What if (additionally) I (C, {A}, {B}) and I (D, {B, C}, {A})? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A)Pr(D B, C)Pr(E C) How many parameters does this require?

Preliminaries Graphoid Axioms d-separation Wrap-up Graphical structures as factorized distributions What is the relationship between the factorized distribution and the graphical structure? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A)Pr(D B, C)Pr(E C) Winter? (A) Sprinkler? (B) Rain? (C) Wet Grass? (D) Slippery Road? (E)

Preliminaries Graphoid Axioms d-separation Wrap-up Graphical structures as factorized distributions What is the relationship between the factorized distribution and the graphical structure? Pr(A, B, C, D, E) = Pr(A)Pr(B A)Pr(C A)Pr(D B, C)Pr(E C) Winter? (A) Sprinkler? (B) Rain? (C) Wet Grass? (D) Slippery Road? (E) The graph structure encodes the conditional independencies.

Preliminaries Graphoid Axioms d-separation Wrap-up Local Markov property Local Markov property. Given a DAG structure G which encodes conditional independencies, we interpret G to compactly represent the following independence statements: I (X, Pa X, NonDesc X ) for all variables X in DAG G. These conditional independencies are denoted as Markov(G).

Preliminaries Graphoid Axioms d-separation Wrap-up Local Markov property - Simple example What conditional independencies are implied by the local Markov property for this DAG? Earthquake? (E) Burglary? (B) Radio? (R) Alarm? (A) Call? (C)

Preliminaries Graphoid Axioms d-separation Wrap-up Bayesian network definition A Bayesian network for a set of random variables X is a pair (G, Θ) where G is a DAG over X, called the structure Θ is a set of CPDs (always CPTs in this course), one for each X X, called the parameterization

Preliminaries Graphoid Axioms d-separation Wrap-up Sample Bayesian network Structure Parameterization Sprinkler? (B) Winter? (A) Wet Grass? (D) Rain? (C) Slippery Road? (E) A Θ A T.6 F.4 C E Θ E C T T.7 T F.3 F T 0 F F 1 A C Θ C A T T.8 T F.2 F T.1 F F.9 A B Θ B A T T.2 T F.8 F T.75 F F.25 B C D Θ D B,C T T T.95 T T F.05 T F T.9 T F F.1 F T T.8 F T F.2 F F T 0 F F F 1

Preliminaries Graphoid Axioms d-separation Wrap-up Chain rule of Bayesian networks Given a Bayesian network B and an instantiation z, then Pr(z B) = θ x pa z θ x pa That is, the probability of z is the probability of each variable given its parents. In the future, we will typically omit B unless we need to distinguish between networks.

Preliminaries Graphoid Axioms d-separation Wrap-up Class work Suppose a Bayesian network has n variables, and each variable can take up to d values. Additionally, no variable has more than k parents. How many parameters does an explicit distribution require? How many parameters does a Bayesian network require? Use O ( ). Using the network on the handout, compute the following probabilities. Remember marginalization and Bayes rule. Pr(A = T, B = T, C = F, D = T, E = F) Pr(A = T, B = T, C = F) Pr(A = T, B = T C = F) Pr(A = T, B = T C = F, D = T, E = F)

Preliminaries Graphoid Axioms d-separation Wrap-up Graphoid axioms The local Markov property tells us that I (X, Pa X, NonDesc X ) for all variables X in DAG G. The graphoid axioms allow us to derive global independencies based on the graph structure. Symmetry Decomposition Weak union Contraction Intersection

Preliminaries Graphoid Axioms d-separation Wrap-up Symmetry If learning something about Y tells us nothing about X, then learning something about X tells us nothing about Y. I (X, Z, Y) I (Y, Z, X) Note that conditional independence is always w.r.t. some set of variables Z as evidence.

Preliminaries Graphoid Axioms d-separation Wrap-up Decomposition If learning something about Y W tells us nothing about X, then learning something about Y or W individually tells us nothing about X. I (X, Z, Y W) I (X, Z, Y) and I (X, Z, W) This allows us to reason about subsets. In particular, I (X, Pa X, W) for all W NonDesc X. Given the topological ordering on the variables, this axiom proves the chain rule for Bayesian networks.

Preliminaries Graphoid Axioms d-separation Wrap-up Weak union If learning something about Y W tells us nothing about X, then Y will not make W relevant. I (X, Z, Y W) I (X, Z Y, W) In particular, X is independent of W NonDesc X given Pa X and the other non-descendents.

Preliminaries Graphoid Axioms d-separation Wrap-up Contraction If learning something about W after learning Y tells us nothing about X, then the combined information Y W was irrelevant to begin with. I (X, Z, Y) and I (X, Z Y, W) I (X, Z, Y W)

Preliminaries Graphoid Axioms d-separation Wrap-up Intersection If W is irrelevant given Y and Y is irrelevant given W, then both Y and W were irrelevant to begin with. I (X, Z W, Y) and I (X, Z Y, W) I (X, Z, Y W) This holds only when the distribution is strictly positive.

Preliminaries Graphoid Axioms d-separation Wrap-up Paths and valves A pipe is a path from one variable to another. Three types of valves compose a pipe. X W Y Chain X W Y Common cause X W Y Common effect The common effect is often referred to as a v-structure when X and Y are not connected.

Preliminaries Graphoid Axioms d-separation Wrap-up Open and closed valves We can consider independence as flow through a pipe. In particular, X and Y are independent given Z if all pipes between them are closed. A pipe is closed if any of its valves are closed. X W Y Closed iff W Z X W Y Closed iff W Z X W Y Closed iff W, DescW / Z Formally, this is called d-separation and is written dsep(x, Z, Y).

Preliminaries Graphoid Axioms d-separation Wrap-up Complexity of d-separation How many paths are there between nodes in X and Y? So is d-separation practically useful?

Preliminaries Graphoid Axioms d-separation Wrap-up Complexity of d-separation How many paths are there between nodes in X and Y? So is d-separation practically useful? Testing dsep(x, Z, Y) is equivalent to testing if X and Y are connected in a new graph. Delete outgoing edges from nodes in Z (Recursively) Delete any leaf which does not belong to X Y Z So we can determine d-separation in linear time and space.

Preliminaries Graphoid Axioms d-separation Wrap-up Markov blanket The Markov blanket for a variable X is a set of variables B such that X / B and I (X, B, V \ B \ {X }. Which variables B d-separate a variable X from all of the other variables (V \ B \ {X })?

Preliminaries Graphoid Axioms d-separation Wrap-up Markov blanket The Markov blanket for a variable X is a set of variables B such that X / B and I (X, B, V \ B \ {X }. Which variables B d-separate a variable X from all of the other variables (V \ B \ {X })? The parents, children and spouses.

Preliminaries Graphoid Axioms d-separation Wrap-up Soundness, completeness and equivalence Every independence found by d-separation is true for any distribution which factorizes according to the BN. There could be independencies that d-separation cannot find (because it only uses the structure). What are the independencies given by these networks? X Y Z X Y Z

Preliminaries Graphoid Axioms d-separation Wrap-up Soundness, completeness and equivalence Every independence found by d-separation is true for any distribution which factorizes according to the BN. There could be independencies that d-separation cannot find (because it only uses the structure). What are the independencies given by these networks? X Y Z X Y Z Different network structures can result in the same independencies. These networks are Markov equivalent.

Preliminaries Graphoid Axioms d-separation Wrap-up Terminology for d-separation A BN is an independence map (I-MAP) of Pr if every independence declared by d-separation holds in Pr. An I-MAP is minimal if it ceases to be an I-MAP when any edge is deleted. A BN is a dependency map (D-MAP) of Pr if the lack of d-separation implies a dependence in Pr.

Preliminaries Graphoid Axioms d-separation Wrap-up Class work Use the network on the handout (the Asian network) to answer the following independence questions. List the Markov blanket of all variables. dsep(p, {A, T, C, S, B, D}, X ) dsep(p, {T, C}, {A, S}) dsep(p, {C, D}, B) dsep(b, S, P) dsep({b, C}, S, P) dsep({b, C}, P, {A, T, X })

Preliminaries Graphoid Axioms d-separation Wrap-up Class work Use the network on the handout (the Asian network) to answer the following independence questions. List the Markov blanket of all variables. dsep(p, {A, T, C, S, B, D}, X ) No dsep(p, {T, C}, {A, S}) Yes dsep(p, {C, D}, B) No dsep(b, S, P) Yes dsep({b, C}, S, P) No dsep({b, C}, P, {A, T, X }) No

Preliminaries Graphoid Axioms d-separation Wrap-up Recap During this class, we discussed Basic terminology and notation for probability and graphs Bayesian networks as a parameterized model BNs as a factorization of a joint probability distribution BNs as a concise representation of conditional independencies based on d-separation Equivalence among BNs based on induced independencies

Preliminaries Graphoid Axioms d-separation Wrap-up Next time in probabilistic models Discriminitive vs. generative learning Multinomial naive Bayes for document classification C A 1 A 2 A 3 A m Hidden Markov models for gene prediction S 1 S 2 S 3 S n O 1 O 2 O 3 O n