Bayesian Network Representation

Similar documents
Directed Graphical Models or Bayesian Networks

2 : Directed GMs: Bayesian Networks

From Distributions to Markov Networks. Sargur Srihari

2 : Directed GMs: Bayesian Networks

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

CS6220: DATA MINING TECHNIQUES

Conditional Independence

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

From Bayesian Networks to Markov Networks. Sargur Srihari

Sistemi Cognitivi per le

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Lecture 5: Bayesian Network

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Template-Based Representations. Sargur Srihari

CS6220: DATA MINING TECHNIQUES

Likelihood Weighting and Importance Sampling

CS Lecture 3. More Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Directed Graphical Models

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Learning in Bayesian Networks

Bayesian Networks Representation

Basic Sampling Methods

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA

Artificial Intelligence Bayesian Networks

Bayes Nets III: Inference

Building Bayesian Networks. Lecture3: Building BN p.1

Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Representation of undirected GM. Kayhan Batmanghelich

Multivariate Gaussians. Sargur Srihari

Introduction to Artificial Intelligence. Unit # 11

Capturing Independence Graphically; Undirected Graphs

Intelligent Systems (AI-2)

PROBABILISTIC REASONING SYSTEMS

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Bayesian belief networks. Inference.

Bayesian Networks BY: MOHAMAD ALSABBAGH

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Probabilistic Graphical Models

CS 188: Artificial Intelligence Fall 2009

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Informatics 2D Reasoning and Agents Semester 2,

Computational Complexity of Inference

Structure Learning: the good, the bad, the ugly

Intelligent Systems (AI-2)

Causal Bayesian networks. Peter Antal

Probabilistic Models

CS 188: Artificial Intelligence. Bayes Nets

Y. Xiang, Inference with Uncertain Knowledge 1

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

COMP5211 Lecture Note on Reasoning under Uncertainty

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Bayesian networks Lecture 18. David Sontag New York University

Alternative Parameterizations of Markov Networks. Sargur Srihari

Learning Bayesian Networks (part 1) Goals for the lecture

BN Semantics 3 Now it s personal! Parameter Learning 1

Bayesian belief networks

Directed Graphical Models

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Probabilistic Graphical Models

Bayesian Networks. Motivation

Directed and Undirected Graphical Models

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Probability Review

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

CSC 412 (Lecture 4): Undirected Graphical Models

CS 5522: Artificial Intelligence II

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

Lecture 8: Bayesian Networks

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Be able to define the following terms and answer basic questions about them:

Learning P-maps Param. Learning

Using Graphs to Describe Model Structure. Sargur N. Srihari

CS 188: Artificial Intelligence Spring Announcements

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Review: Bayesian learning and inference

Fusion in simple models

Bayes Nets: Independence

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Bayesian networks. Chapter Chapter

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Variable Elimination: Algorithm

CSEP 573: Artificial Intelligence

Bayesian networks as causal models. Peter Antal

Rapid Introduction to Machine Learning/ Deep Learning

Variable Elimination: Algorithm

Structure Learning in Bayesian Networks

Reasoning Under Uncertainty: Belief Network Inference

Transcription:

Bayesian Network Representation Sargur Srihari srihari@cedar.buffalo.edu 1

Topics Joint and Conditional Distributions I-Maps I-Map to Factorization Factorization to I-Map Perfect Map Knowledge Engineering Picking Variables Picking Structure Picking Probabilities Sensitivity Analysis 2

Joint Distribution Company is trying to hire recent graduates Goal is to hire intelligent employees No way to test intelligence directly But have access to Student s SAT score Which is informative but not fully indicative Two random variables Intelligence: Val(I)={i 1,i 0 }, high and low Score: Val(S)={s 1,s 0 }, high and low Joint distribution has 4 entries Need three parameters 3

Alternative Representation: Conditional Parameterization P(I, S) = P(I)P(S I) Representation more compatible with causality Intelligence influenced by Genetics, upbringing Score influenced by Intelligence Note: BNs are not required to follow causality but they often do Need to specify P(I) and P(S I) Intelligence SAT Three binomial distributions (3 parameters) needed One marginal, two conditionals P(S I=i 0 ), P(S i=i 1 ) 4

Naïve Bayes Model Conditional Parameterization combined with Conditional Independence assumptions Val(G)={g 1, g 2, g 3 } represents grades A, B, C SAT and Grade are independent given Intelligence (assumption) Knowing intelligence, SAT gives no information about class grade Assertions From probabilistic reasoning From assumption Combining Grade Intelligence SAT P(G I) Three binomials, two 3-value multinomials: 7 params More compact than 5 joint distribution

BN for General Naiive Bayes Model Class X 1 X 2... X n n i=1 P(C, X 1,..X n ) = P(C) P(X i C) Encoded using a very small number of parameters Linear in the number of variables 6

Application of Naiive Bayes Model Medical Diagnosis Pathfinder expert system for lymph node disease (Heckerman et.al., 1992) Full BN agreed with human expert 50/53 cases Naiive Bayes agreed 47/53 cases 7

Graphs and Distributions Relating two concepts: Independencies in distributions Independencies in graphs I-Map is a relationship between the two 8

Independencies in a Distribution Let P be a distribution over X I(P) is set of conditional independence assertions of the form (X Y Z) that hold in P X Y P(X,Y) x 0 y 0 0.08 x 0 y 1 0.32 x 1 y 0 0.12 x 1 y 1 0.48 X and Y are independent in P, e.g., P(x 1 )=0.48+0.12=0.6 P(y 1 )=0.32+0.48=0.8 P(x 1,y 1 )=0.48=0.6x0.8 Thus (X Y ϕ) I(P) 9

Independencies in a Graph i 0,d 0 i 0,d 1 i 0,d 0 i 0,d 1 d 0 d 1 0.6 0.4 g 1 0.3 0.05 0.9 0.5 g 2 g 3 0.4 0.25 0.08 0.3 0.3 0.7 Difficulty 0.02 0.2 g 1 g 2 g 2 Grade Letter l 0 0.1 0.4 0.99 Intelligence l 1 0.9 0.6 0.01 i 0 i 1 SAT s 0 i 0 0.95 0.2 i 1 0.7 0.3 s 1 0.05 0.8 Graph with CPDs is equivalent to a set of independence assertions P(D, I,G,S, L) = P(D)P(I)P(G D, I)P(S I)P(L G) Local Conditional Independence Assertions (starting from leaf nodes): I(G) = {(L I, D,S G), (S D,G, L I), (G S D, I), (I D φ), (D I,S φ)} L is conditionally independent of all other nodes given parent G S is conditionally independent of all other nodes given parent I Even given parents, G is NOT independent of descendant L Nodes with no parents are marginally independent D is independent of non-descendants I and S Parents of a variable shield it from probabilistic influence Once value of parents known, no influence of ancestors Information about descendants can change beliefs about a node

I-MAP Let G be a graph associated with a set of independencies I(G) Let P be a probability distribution with a set of independencies I(P) Then G is an I-map of I if I(G) I(P) From direction of inclusion distribution can have more independencies than the graph Graph does not mislead in independencies existing in P 11

Example of I-MAP X G 0 encodes X Y or I(G 0 )={X Y} X G 1 encodes no Independence or I(G 1 )={Φ} X G 2 encodes no Independence I(G 2 )={Φ} Y Y Y X Y P(X,Y) x 0 y 0 0.08 x 0 y 1 0.32 x 1 y 0 0.12 x 1 y 1 0.48 X and Y are independent in P, e.g., G 0 is an I-map of P G 1 is an I-map of P G 2 is an I-map of P X Y P(X,Y) x 0 y 0 0.4 x 0 y 1 0.3 x 1 y 0 0.2 x 1 y 1 0.1 X and Y are not independent in P Thus (X Y) \ I(P) G 0 is not an I-map of P G 1 is an I-map of P G 2 is an I-map of P If G is an I-map of P then it captures some of the independences, not all

I-map to Factorization A Bayesian network G encodes a set of conditional independence assumptions I(G) Every distribution P for which G is an I-map should satisfy these assumptions Every element of I(G) should be in I(P) This is the key property to allowing a compact representation 13

I-map to Factorization From chain rule of probability P(I,D,G,L,S)=P(I)P(D I)P(G I,D)P(L I,D,G)P(S I,D,G,L) Relies on no assumptions Also not very helpful Last factor requires evaluation of 24 conditional probabilities Apply conditional independence assumptions induced from the graph D I I(P) therefore P(D I)=P(D) (L I,D) I(P) therefore P(L I,D,G)=P(L G) Difficulty Thus we get P(D, I,G,S, L) = P(D)P(I)P(G D, I)P(S I)P(L G) Which is a factorization into local probability models Grade Letter Thus we can go from graphs to factorization of P Intelligence SAT

Factorization to I-map We have seen that we can go from the independences encoded in G, i.e., I (G), to Factorization of P Conversely, Factorization according to G implies associated conditional independences If P factorizes according to G then G is an I-map for P Need to show that if P factorizes according to G then I(G) holds in P Proof by example 15

Example that independences in G hold in P Difficulty Intelligence P is defined by set of CPDs Consider independences for S in G, i.e., P(S D,G,L I) Grade Letter SAT Starting from factorization induced by graph P(D, I,G,S, L) = P(D)P(I)P(G D, I)P(S I)P(L G) Can show that P(S I,D,G,L)=P(S I) Which is what we had assumed for P 16

Perfect Map I-map All independencies in I(G) present in I(P) Trivial case: all nodes interconnected D-Map All independencies in I(P) present in I(G) Trivial case: all nodes disconnected Perfect map Both an I-map and a D-map Interestingly not all distributions P over a given set of variables can be represented as a perfect map Venn Diagram where D is set of distributions that can be represented as a perfect map I(G)={} I(G)={A B,C} D P

Knowledge Engineering Going from given distribution to Bayesian network is more complex We have a vague model of the world Need to crystallize it into network structure and parameters Task has several components Each is subtle Mistakes have consequences in quality of answers 18

Three tasks in model building All three tasks are hard: 1. Picking variables Many ways to pick entities and attributes 2. Determining structure Many structures hold 3. Determining probabilities Eliciting probabilities from people is hard 19

1. Picking Variables Model should contain variables we can observe or that we will query Choosing variables is one of the hardest tasks There are implications throughout the model Common problem: ill-defined variables In medical domain: variable Fever Temperature at time of admission? Over prolonged period? Thermometer or internal temperature? Interaction of fever with other variables depend on specific interpretation

Need for Hidden Variables There are several Cholestorol Tests For accurate answers: Nothing to eat after 10:00pm If person eats, all tests become correlated Hidden variable: willpower Including it will render: cholestorol tests conditionally independent given true cholestorol level and willpower Chol Level C Hidden variables: to avoid all variables being correlated 21 A B C,W Test A Test A Chol Level C Will power W Test B Test B

Some variables not needed Not necessary to include every variable SAT score may depend on partying previous night Probability already accounts for poor score despite intelligence 22

Picking Domain for Variables Reasonable domain of values to be chosen If partitions not fine enough conditional independence assumptions may be false Task of determining cholestorol level (C) Two tests A and B (A B C) C: Normal if < 200, High if > 200 Both tests fail if chol level has a marginal value( say 210) Conditional independence assump. is false! Introduce marginal value Test A Chol Level C Test B

2. Picking Structure Many structures are consistent if we pick same set of independences Choose structure that reflects causal order and dependencies Causes are parents of the effect Causal graphs tend to be sparser Backward Construction Process Lung cancer should have smoking as a parent Smoking should have gender as a parent 24

Modeling weak influences Reasoning in a Bayesian network strongly depends on connectivity Adding edges can make it expensive to use Make approximations to decrease complexity No Start No Start Battery Gas Fault

3. Picking Probabilities Zero Probabilities Common mistake Event extremely unlikely but not impossible Can never condition away: irrecoverable errors Orders of Magnitude Small diffs in low probs can make large differences in conclusions 10-4 is very different from 10-5 Relative Values Probability of fever higher with pneumonia than with flu Disease \Fever Disease Fever High Lo Pneum 0.9 0.1 Flu 0.6 0.4

Sensitivity Analysis Useful tool for estimating network parameters Determine extent to which a given probability parameter affects outcome Allows us to determine whether it is important to get a particular CPD entry right Helps figure out which CPD entries are responsible for an answer that does not match our intuition 27