Building Bayesian Networks. Lecture3: Building BN p.1

Similar documents
Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

The Naïve Bayes Classifier. Machine Learning Fall 2017

Stephen Scott.

Probabilistic Models

Bayesian Learning Features of Bayesian learning methods:

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

CS6220: DATA MINING TECHNIQUES

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayesian Learning. Bayesian Learning Criteria

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Introduction to Artificial Intelligence. Unit # 11

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CS 188: Artificial Intelligence Fall 2009

Bayesian Classification. Bayesian Classification: Why?

Lecture 9: Bayesian Learning

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

CS6220: DATA MINING TECHNIQUES

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

COMP 328: Machine Learning

Decision Tree Learning and Inductive Inference

CS 6375 Machine Learning

Directed Graphical Models or Bayesian Networks

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Informatics 2D Reasoning and Agents Semester 2,

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Artificial Intelligence. Topic

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Rapid Introduction to Machine Learning/ Deep Learning

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Decision Trees. Gavin Brown

Directed and Undirected Graphical Models

An Introduction to Bayesian Networks: Representation and Approximate Inference

Lecture 3: Decision Trees

Intuition Bayesian Classification

Learning Decision Trees

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 5522: Artificial Intelligence II

Learning Classification Trees. Sargur Srihari

Bayesian Networks BY: MOHAMAD ALSABBAGH

CS 188: Artificial Intelligence Spring Announcements

Decision Trees.

Bayesian Networks and Decision Graphs

Decision Tree Learning - ID3

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Cognitive Systems 300: Probability and Causality (cont.)

Introduction to Machine Learning CMU-10701

CS 343: Artificial Intelligence

Learning Decision Trees

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Bayesian Network Representation

Algorithms for Classification: The Basic Methods

Machine Learning 2nd Edi7on

Modeling and reasoning with uncertainty

Chapter 3: Decision Tree Learning

Mining Classification Knowledge

Data classification (II)

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Decision Trees.

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Machine Learning Lecture 14

Artificial Intelligence Bayesian Networks

CS 188: Artificial Intelligence Spring Announcements

Y. Xiang, Inference with Uncertain Knowledge 1

Lecture 3: Decision Trees

Naïve Bayes Classifiers

Probability Based Learning

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Decision Trees / NLP Introduction

Decision Tree Learning

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Bayesian Networks Inference with Probabilistic Graphical Models

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayesian Networks for Causal Analysis

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Mining Classification Knowledge

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Transcription:

Building Bayesian Networks Lecture3: Building BN p.1

The focus today... Problem solving by Bayesian networks Designing Bayesian networks Qualitative part (structure) Quantitative part (probability assessment) Simplified Bayesian networks In structure: Naïve Bayes, Tree-Augmented Networks In probability assessment: Parent divorcing, Causal Independence Lecture3: Building BN p.2

Problem solving Bayesian networks: a declarative (knowing what) knowledge-representation formalism, i.e.,: mathematical basis problem to be solved determined by (1) entered evidence E (including potential decisions); (2) given hypothesis H : P(H E) Examples: Description of population (or prior information) Classification and diagnosis: D = arg max H P(H E) i.e. D is the hypothesis with maximum P(H E) Prediction Decision making based on what-if scenario s Lecture3: Building BN p.3

Prior information Gives description of the population on which the assessed probabilities are based, i.e., the original probabilities before new evidence is uncovered Marginal probabilities P(V ) for every vertex V, e.g., P(WILSON S DISEASE = yes) Lecture3: Building BN p.4

Diagnostic problem solving Gives description of the subpopulation of the original population or individual cases Marginal probabilities P (V ) = P(V E) for every vertex V, e.g., P(WILSON S DISEASE = yes E) for entered evidence E (red vertices, with probability for one value equal to 1) Lecture3: Building BN p.5

Prediction of associated findings Gives description of the findings associated with a given class or category, such as Wilson s disease Marginal probabilities P (V ) = P(V E) for every vertex V, e.g., P(Kayser-Fleischer Rings = yes E) with E evidence Lecture3: Building BN p.6

Design of Bayesian network Current design principle: start modelling qualitatively (different from traditional knowledge-based systems) refinement causal graph qualitative probabilistic network quantitative network variables/ relationships domains of variables/ qualitative probabilistic information numerical assessment experts/dataset Bayesian network evaluation Lecture3: Building BN p.7

Terminology NO YES FLU no yes FEVER <=37.5 >37.5 TEMP no yes VisitToChina no yes SARS no yes DYSPNOEA Parent SARS of Child FEVER SARS is Ancestor of TEMP DYSPNOEA is Descendant of VisitToChina Query node, e.g., FEVER Evidence, e.g., VisitToChina and TEMP Markov blanket, e.g., for SARS: {VisitToChina, DYSPNOEA, FEVER, FLU} Lecture3: Building BN p.8

Causal graph: Topology (structure) cause 1. effect cause n Identify factors that are relevant Determine how those factors are causally related to each other The arc cause effect does mean that cause is a factor involved in causing effect Lecture3: Building BN p.9

Causal graph: Common effect cause 1. effect cause n An effect that has two or more ingoing arcs from other vertices is a common effect of those causes Kinds of causal interaction Synergy: POLUTION CANCER SMOKING Prevention: VACCINE DEATH SMALLPOX XOR: ALKALI DEATH ACID Lecture3: Building BN p.10

Causal graph: Common cause effect 1 cause. effect n A cause that has two or more outgoing arcs to other vertices is a common cause (factor) of those effects The effects of a common cause are usually observables (e.g. manifestations of failure of a device or symptoms in a disease) Lecture3: Building BN p.11

Causal graph: Example MYALGIA FLU FEVER TEMP PNEUMONIA FEVER and PNEUMONIA are two alternative causes of fever (but may enhance each other) FLU has two common effects: MYALGIA and FEVER High body TEMPerature is an indirect effect of FLU and PNEUMONIA, caused by FEVER Lecture3: Building BN p.12

Check independence relationship MYALGIA FLU FEVER TEMP PNEUMONIA Conditional independence: X Y Z {FEVER} {MYALGIA} {FLU}? {FEVER} {PNEUMONIA} {FLU}? {PNEUMONIA} {FLU} {FEVER} Lecture3: Building BN p.13

Choose variables Factors are mutually exclusive (cannot occur together with absolute certainty): put as values in the same variable, or Factors may co-occur: multiple variables MYALGIA MYALGIA DISEASE pneu/flu FLU yes/no FEVER FEVER PNEUMONIA yes/no (a) Single variable (b) Multiple variables Lecture3: Building BN p.14

Choose values Discrete values Mutually exclusive and exhaustive Types: binary, e.g., FLU = yes/no, true/false, 0/1 ordinal, e.g., INCOME = low, medium, high nominal, e.g., COLOR = brown, green, red integral, e.g., AGE = {1,...,120} Continuous values Discretization (of continuous values) Example for TEMP: [ 50, +5) cold [+5, +20) mild [+20, +50] hot Lecture3: Building BN p.15

Probability assessment qualitative orders and equalities check consistency indegree of vertex > 4 divorcing/causal independence EVALUATION 4 8 >< quantitative assessment experts/dataset probability distribution compare prediction with literature >: compare with test dataset Lecture3: Building BN p.16

Qualitative probabilities: Qualitative orders: Equalities: Expert judgements AGE P(General Health Status AGE) 10-69 good > average > poor 70-79 average > good > poor 80-89 average > poor > good 90 poor > average > good P(CANCER = T1 AGE = 15 29) = P(CANCER = T2 AGE = 15 29) Lecture3: Building BN p.17

Expert judgements (cont.) Quantitative, subjective probabilities: P(GHS AGE) AGE good average poor 10-69 0.99 0.008 0.002 70-79 0.3 0.5 0.2 80-89 0.1 0.5 0.4 90 0.1 0.3 0.6 Lecture3: Building BN p.18

A bottleneck in Bayesian networks RABIES EARIN F FLU BRON CH FEV ER The number of parameters for the effect given n causes grows exponentially: 2 n (for binary causes) Unlikely evidence combination: P(fever flu,rabies,ear_infection) =? Problem: for many BNs too many probabilities have to be assessed Lecture3: Building BN p.19

Special form Bayesian networks Solution: use simpler probabilistic model, such that either the structure becomes simpler, e.g., or, naive (independent) form BN T ree-augmented Bayesian Network (TAN) the assessment of the conditional probabilities becomes simpler (even though the structure is still complex), e.g., parent divorcing causal independence BN Lecture3: Building BN p.20

Independent (Naive) form BN E 2 E 1 E m C C is a class variable E i are evidence variables and E {E 1,...,E m }. We have E i E j C, for i j. Hence, using Bayes rule: P(C E) = P(E C)P(C) P(E) with: P(E C) = E E P(E C) by cond. ind. P(E) = C P(E C)P(C) marg. & cond. Lecture3: Building BN p.21

Example of Naive Bayes Lecture3: Building BN p.22

Example of Naive Bayes (1) Learning phase Outlook Yes No Temperature Yes No Sunny Hot Overcast Mild Rain Cool Humidity Yes o Wind Yes No High Strong Normal Weak P =Yes) = P =No) = Lecture3: Building BN p.23

Example of Naive Bayes (2) Testing phase (inference) Play Tennis Outlook Temp Humidity Wind Lecture3: Building BN p.24

Example of Naive Bayes (3) Testing phase (inference) Play Tennis Outlook Temp Humidity Wind PYes x Px Ye P Yes x Sunny es Cool Yes High es Strong Yes Yes 0.0053 No x Sunny o Cool o High No Strong No No 0.0206 PYes x PNo x x PlayTennis No Note PYes x PYes x PYes x PNo x Lecture3: Building BN p.25

Tree-Augmented BN (TAN) E2 E 3 E 4 E 1 E 5 C Extension of Naive Bayes: reduce the number of independent assumptions Each node has at most two parents (one is the class node) Lecture3: Building BN p.26

Divorcing multiple parents Surgery Surgery Drug Treatment Survival Drug Treatment Post-therapy Survival Survival General Health General Health (a) Original network (b) Divorced network Reduction in number of probabilities to assess: Identify a potential common effect of two or more parent vertices of a vertex Introduce a new variable into the network, representing the common effect Lecture3: Building BN p.27

Causal Independence C 1 C 2... C n I 1 I 2... I n with: f E cause variables C j, intermediate variables I j, and the effect variable E P(E I 1,...,I n ) {0, 1} interaction function f, defined such that { e if P(e I 1,...,I n ) = 1 f(i 1,...,I n ) = e if P(e I 1,...,I n ) = 0 Lecture3: Building BN p.28

Causal Independence: BN C 1 C 2... C n I 1 I 2... I n f E P(e C 1,..., C n ) = = X P(e I 1,..., I n )P(I 1,..., I n C 1,..., C n ) I 1,...,I n X P(e I 1,..., I n )P(I 1,..., I n C 1,..., C n ) f(i 1,...,I n )=e Note that as I i I j, and I i C j C i, for i j, it holds that: P(I 1,..., I n C 1,..., C n ) = ny k=1 P(I k C k ) Conclusion: assessment of P(I i C i ) instead of P(E C 1,..., C n ), i.e., 2n vs. 2 n probabilities Lecture3: Building BN p.29

Causal independence: Noisy OR C 1 C 2 I 1 I 2 OR Interactions among causes, as represented by the function f and P(E I 1,I 2 ), is a logical OR E Meaning: presence of any one of the causes C i with absolute certainty will cause the effect e (i.e. E = true) P(e C 1,C 2 ) =? Lecture3: Building BN p.30

Causal independence: Noisy OR (cont.) C 1 C 2 I 1 I 2 OR E P(e C 1,C 2 ) = I 1,I 2 P(e I 1,I 2,C 1,C 2 )P(I 1,I 2 C 1,C 2 ) =? Lecture3: Building BN p.31

Causal independence: Noisy OR (cont.) C 1 C 2 I 1 I 2 OR E P(e C 1,C 2 ) = I 1,I 2 P(e I 1,I 2,C 1,C 2 )P(I 1,I 2 C 1,C 2 ) = P(e I 1,I 2 ) P(I k C k ) f(i 1,I 2 )=e k=1,2 Lecture3: Building BN p.32

Causal independence: Noisy OR (cont.) C 1 C 2 I 1 I 2 OR E I 1 I 2 P(e I 1, I 2 ) f(i 1, I 2 ) = I 1 I 2 0 0 0 e 0 1 1 e 1 0 1 e 1 1 1 e P(e C 1, C 2 ) = X P(e I 1, I 2 ) Y P(I k C k ) f(i 1,I 2 )=e k=1,2 = P(i 1 C 1 )P(i 2 C 2 ) + P( i 1 C 1 )P(i 2 C 2 ) + P(i 1 C 1 )P( i 2 C 2 ) Lecture3: Building BN p.33

Noisy OR: Real-world example Dynamic Bayesian network for predicting the development of hypertensive disorders during pregnancy VascRisk (Vascular risk) has 11 causes and its original CPT requires the estimation of 20736!!! entries. Practically impossible! Solution: use noisy OR to simplify it Lecture3: Building BN p.34

Causal independence: Noisy AND C 1 C 2 I 1 I 2 AND Interactions among causes, as represented by the function f and P(E I 1,I 2 ), is a logical AND Meaning: presence of all causes C i with absolute certainty will cause the effect e (i.e. E = true); otherwise, e E P(e C 1,C 2 ) =? Lecture3: Building BN p.35

Are Bayesian networks always suitable? Essentially, all models are wrong but some are useful George Box, Norman Draper (1987), Empirical Model-Building and Response Surfaces, Wiley Problem (modelling) objective, e.g., for function approximation or pure numeric prediction without a need to explain the results a black box model such as neural networks can be sufficient Sufficient knowledge about the problem (domain experts, literature, data) Complexity of the problem e.g., is it decomposable Lecture3: Building BN p.36

Refining causal graphs Model refinement is necessary. How: Manual Automatic What: Probability adjustment Removing irrelevant factors Adding previously hidden, unknown factors Causal relationships adjustment, e.g., including, removing independence relations Lecture3: Building BN p.37