Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Similar documents
Introduction to Artificial Intelligence. Unit # 11

An Introduction to Bayesian Networks: Representation and Approximate Inference

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Rapid Introduction to Machine Learning/ Deep Learning

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Bayesian Networks for Causal Analysis

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Directed Graphical Models

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Machine Learning Lecture 14

Reasoning Under Uncertainty: Belief Network Inference

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Building Bayesian Networks. Lecture3: Building BN p.1

Directed Graphical Models or Bayesian Networks

Bayesian Networks BY: MOHAMAD ALSABBAGH

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Lecture 5: Bayesian Network

Generative Techniques: Bayes Rule and the Axioms of Probability

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

COMP5211 Lecture Note on Reasoning under Uncertainty

Bayesian Networks and Decision Graphs

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Directed and Undirected Graphical Models

Conditional Independence

Uncertainty and Bayesian Networks

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Reasoning Under Uncertainty: Bayesian networks intro

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Semirings for Breakfast

Artificial Intelligence Bayesian Networks

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Module 10. Reasoning with Uncertainty - Probabilistic reasoning. Version 2 CSE IIT,Kharagpur

Bayesian Machine Learning

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayesian Networks Representation

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Review: Bayesian learning and inference

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Quantifying uncertainty & Bayesian networks

Informatics 2D Reasoning and Agents Semester 2,

Learning in Bayesian Networks

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Machine Learning for Data Science (CS4786) Lecture 24

Probabilistic Graphical Models (I)

Y. Xiang, Inference with Uncertain Knowledge 1

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Introduction to Artificial Intelligence Belief networks

Probabilistic Models

{ p if x = 1 1 p if x = 0

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

UNCERTAIN KNOWLEDGE AND REASONING

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Probabilistic Graphical Networks: Definitions and Basic Results

Chris Bishop s PRML Ch. 8: Graphical Models

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Bayesian Approaches Data Mining Selected Technique

Part I Qualitative Probabilistic Networks

PROBABILISTIC REASONING SYSTEMS

Intro to AI: Lecture 8. Volker Sorge. Introduction. A Bayesian Network. Inference in. Bayesian Networks. Bayesian Networks.

Graphical Models - Part I

Capturing Independence Graphically; Undirected Graphs

Probabilistic Classification

CS6220: DATA MINING TECHNIQUES

Introduction to Causal Calculus

Bayesian network modeling. 1

Lecture 6: Graphical Models

CS 188: Artificial Intelligence Spring Announcements

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Bayesian Network Modelling

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

Directed and Undirected Graphical Models

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Lecture 8: Bayesian Networks

Computational Genomics

Bayesian belief networks. Inference.

Bayes Networks 6.872/HST.950

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

CS 188: Artificial Intelligence Fall 2009

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Introduction to Probabilistic Graphical Models

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Probabilistic Reasoning Systems

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Statistical Approaches to Learning and Discovery

Bayesian Networks. Motivation

Capturing Independence Graphically; Undirected Graphs

An Introduction to Bayesian Machine Learning

Probability and Probability Distributions. Dr. Mohammed Alahmed

Transcription:

Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example Problem... 2 Probability Distribution Tables... 2 Limitations of Naïve Bayesian Systems... 3 Bayesian Networks...4 Conditional Probability Tables.... 5 Independent Random Variables... 5 Conditional Independence... 5 Chain Rule... 6 Conditional Independence Properties... 6 Computing with Conditional Probability Tables... 7 A Joint Distribution in Structured form... 8 Bayesian Network Construction...11 Construction process... 11 Reasoning with Bayesian networks...12 Explaining away... 13 Pearl's Network Construction Algorithm...14 Sources: 1. Bishop, C. "Pattern Recognition and Machine Learning (Information Science and Statistics), " Springer, New York (2007). 2. Koller, D., and Friedman, N., Probabilistic graphical models: principles and techniques. MIT press, 2009. 3. Kelleher, John D., Mac Namee, B, and D'Arcy A., Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT Press, 2015.

Naïve Bayesian Systems Example Problem For today s lecture we will illustrate Bayesian Reasoning with the problem of diagnosis of Lung disease. Example Problem A patient has been suffering from shortness of breath (called dyspnoea) and visits the doctor, worried that he has lung cancer. The doctor knows that other diseases, such as tuberculosis and bronchitis, are possible causes, as well as lung cancer. She also knows that other relevant information includes whether or not the patient is a smoker (increasing the chances of cancer and bronchitis) and what sort of air pollution he has been exposed to. A positive X-ray would indicate either Tuberculosis or lung cancer. Recent travail in Asia can increase the risk of Tuberculosis. Probability Distribution Tables A possible approach would be to build a histogram that tells the probability cancer given the symptoms. For example, consider the set of Boolean random variables: C: Cancer (Boolean) - patient has Lung Cancer, S: Smoker: Patient smokes X: X-Ray: Patients X-Ray is positive. D: Dyspnoea (Shortness of breath) B: Bronchitis - Patient has Bronchitis P: Pollution - The atmosphere is polluted A: Asthma - The patient has asthma T: Tuberculosis - The patient has Tuberculosis. Collect a sample of M patients { X! m } where each patient is described by the Boolean! vector of variables: X m = (C, S, X, D, N, P, A, T). Construct a table h(c, S, X, D, N, P, A, T) with Q=2 8 cells and count the number of times each vector (C, S, X, D, N, P, A, T) occurs in a sample population. The table h(c, S, X, D, N, P, A, T) is a probability distribution table. A probability distribution table lists the joint outcome for the variables. 13-2

We can use the table to build a naive Bayesian diagnostic system. For a probability distribution P(A,B), the sum rule tells us " P(x, B) = P(B) Bayes rule (Conditional probability) can be defined as P(A B) = P(A, B) = P(x, B) " x P(A, B) P(B) Consider a table constructed from C = Patient has lung Cancer, S=Patient smokes X=Patients X-Ray is positive and D= Dyspnoea (Shortness of breath) x h(c = T,S,X, D) h(c = T,S,X, D) Thus P(C = T S, X, D) = = h(x,s,x, D) h(c = T,S,X, D)+ h(c = F,S,X, D) " x Limitations of Naïve Bayesian Systems A probability distribution table requires one cell for each possible combination of values for all variables. For D Boolean variables, the size of the table is Q = 2 D. However, random variable can take on values other than Boolean. Variables can be: Ordered symbols: X {low, medium, high} Arbitrary (unordered) Symbols: X {English, French, Spanish} Integer Values: X {1, 2,..., 120} Sets of symbols: X {{A,B}, {A, C}, {B, C, D}} For real problems Q can grow very large very fast. In this case the size of the table is the product of the number of values of the variables. Q=N 1 N 2 N D Where N is the number of values of each random Variable. In addition, such a table only describes correlation of observation. It does not allow us to reason about causes and relations. It does not allow us to explain cause and effects. For this, we can use Bayesian Networks. 13-3

Bayesian Networks Bayesian Networks (BNs) are graphical models for reasoning under uncertainty, where the nodes represent random variables (discrete or continuous) and the arcs represent relations between variable. Arcs are often causal connections but can be other forms of association. Bayesian networks allow probabilistic beliefs about random variables to be updated automatically as new information becomes available. The nodes in a Bayesian network represent the probability of random variables, X from the domain. In our previous lectures these were referred to as "features". Directed arcs (or links) connect pairs of nodes, X 1 " X 2, representing the direct dependencies between random variables. For example: Fire causes Smoke. Let F=Fire, S=Smoke We can use graphical models to represent causal relations. For example add a third random variable, H=Heat. Then Fire causes Smoke and Heat would be expressed as: Graphical models can also express multiple possible causes for diagnostic reasoning. For example, Fire can be caused by an Electrical problem (E) or by a Cigarette (C) The strength of the relationship between variables is quantified by conditional probability distributions associated with each node. These are represented by Conditional Probability Tables. 13-4

Conditional Probability Tables. Bayesian Networks factor a large Probability Distribution Table (PDT) into a set of much smaller Conditional Probability Tables (CPTs). Factoring a PDT requires that the variables be conditionally independent. Independent Random Variables Two random variables are Independent if P(A, B) = P(A)" P(B) Formally, this is written: A B Independence implies that P(A B) = P(A) Demonstration: P(A B) = P(A, B) P(B) = P(A)P(B) P(B) = P(A) Conditional Independence Conditional independence occurs when observations A and B are independent given a third observations C. Conditional independence tells us that when we know C, evidence of B does not change the likelihood of A. If A and B are independent given C then P(A B, C) = P( A C). Formally: A B C P(A B, C) = P(A C) Note that A B C = B A C P(B A, C) = P(B C) A typical situation is that both A and B result from the same cause, C. For example, Fire causes Smoke and Heat. When A is conditionally independent from B given C, we can also write: 13-5

P(A, B C) = P(A B, C) P(B C) = P(A C) P(B C) Chain Rule When A and B are conditionally independent given C, P(A B, C) = P(A C) P(A, B C) = P(A C) P(B C) When conditioned on C, the probability distribution table P(A, B) factors into a product of marginal distributions, P(A C) and P(B C). Conditional independence allows us to factor a Probability Distribution Table into a product of much smaller Conditional Probability Tables. Bayesian networks explicitly express conditional independencies in probability distributions and allows computation of probabilities distributions using the chain rule. Conditional Independence Properties. We can identify several useful properties for conditional independence: Symmetry: ( A B C) = ( B A C) Decomposition: ( A B, C D) ( A B D) Weak Union: ( A B, C D) ( A B D, C) 13-6

Computing with Conditional Probability Tables Conditional independence allows us to factor a Probability Distribution into a product of much smaller Conditional Probability Tables. For example, let F=Fire, S=Smoke and H=Heat. P(F, S, H) = P(S F) P(H F) P(F) Each factor is described by a Conditional Probability Table. Each row of the table must sum to 1. To simplify the table, most authors do not include the last column. The values for last column are determined by subtracting the sum of the other columns from 1. Arcs link a "Parent node" to a "Child Node). F S Fire is Parent to Smoke This is written Parent(S) = F The set of all parents of a node x is the function Parents(x). In General P(X 1, X 2,!, X D ) = " P(X n parents(x n )) n 13-7

We can use the network to answer questions. For example: What is the probability of fire if we see smoke? P(F S) = P(F,S) P(S) For this we need the joint probability of fire and smoke, P(F,S) We get this as the product of the nodes: P(F,S) = " P(F,S, H ) = " P(H F) P(S F)P(F) H H P(F,S) = P(H F)P(S F)P(F)+P( H F)P(S F)P(F) = 0.9 x 0.99 x 0.1 + 0.90 + 0.01 + 0.1 = 0.09 What is the probability of seeing Smoke? P(S) = "" P(F,S, H ) = "" P(H F) P(S F) P(F) F H P(S) = P(H F)P(S F)P(F) +P( H F)P(S F)P(F) +P(H F)P(S F)P( F) +P( H F)P(S F)P( F) F H P(S)=(0.90 0.99 0.10) + (0.001 0.0001 0.10) + (0.001 0.0001 0.90) + (0.001 0.9999 0.90) = 0.0909 From which we have: P(F S) = P(F,S) P(S) = 0.09 0.0909 = 0.99 A Joint Distribution in Structured form A Bayesian Network is a Joint Distribution in Structured form. The network is an Acyclic Directed Graph. Dependence and independence are represented as a presence or absence of edges: Node = random Variable Directed Edge = Conditional Dependence Absence of an Edge = Conditional Independence. 13-8

The graph shows conditional (and causal) relationships The tables provide data for computing the probability distribution. Marginal Independence: P(A, B, C) = P(A) P(B) P(C) Independence Causes: (Common Effect) Markov Dependence (Causal Chain) P(A, B, C)=P(C A, B) P(A) P(B) Common Cause P(A,B,C)=P(C B) P(B A) P(A) P(A,B,C) = P(B A) P(C A) P(A) Arcs link a "Parent node" to a "Child Node). A is the Parent of B. This is written A B Parent(B) = A The set of all parents of a node x is the function Parents(x). In General P(X 1, X 2,!, X D ) = " P(X n Parents(X n )) n A series of arcs list ancestors and descendents A B C Node A is an ancestor of C. Node C is a descendent of A. 13-9

Markov Blanket: Parents, Children and all of Children's Parents. MB(C) The Markov blanket of a node contains all the variables that shield the node from the rest of the network. This means that the Markov blanket of a node is the only knowledge needed to predict the behavior of that node. The children's parents are included, because they can be used to explain away the node in question. (This is described below). 13-10

Bayesian Network Construction Construction process Networks are generally constructed by hand. Processes for automatic construction (learning) of networks is an active research area. To construct a Bayesian Network, a knowledge engineer must identify the relevant random variables (Features), and their possible values, Determine their dependence and causal relations, and determine the conditional probability tables. The knowledge engineer then constructs a network that captures relations between variables. She then determines the Conditional Probability Tables. 1) Identify the relevant Random Variables and their values. The knowledge engineer must identify the relevant random variables (Features), and their possible values. The values may be Boolean, Symbolic or Discrete Numbers or even PDFs. The values of random variables must be both mutually exclusive and exhaustive. It is important to choose values that efficiently represent the domain. 2) Define the structure The knowledge engineer then constructs a network that captures qualitative relations between variables. Two nodes should be connected directly if one affects or causes the other, with the arc indicating the direction of the effect. Causal relations are important, but other forms of correlation are possible. The topology of the network captures qualitative relations between random variables. Note that networks are NOT UNIQUE. Probability Density Tables. Different networks can produce the same 3) Determine the Conditional Probability Tables. Once the network is established, we need to determine the Conditional Probability Tables (CPT)s. The tables lists the probability for each combination of values for parent nodes. Each combination of values is called an Instantiation of the parents. For each possible instantiation, we specify (or learn) the probability of each possible value of the child. The network can then be used for reasoning. 13-11

Reasoning with Bayesian networks Bayesian networks provide full representations of probability distributions over their variables, and support several types of reasoning. Reasoning (inference) occurs as a flow of information through the network. This is sometimes called propagation or belief updating or even conditioning. Note that information flow is not limited to the directions of the arcs. Diagnostic Reasoning Diagnostic reasoning is reasoning from symptoms to cause Diagnostic reasoning occurs in the opposite direction to the network arcs. Example: A fire (F) can be caused by an electrical problem (E) or a Cigarette (C) The fire causes smoke (S) and Heat (H). Diagnostic Reasoning Predictive reasoning If we discover an electrical problem, we can predict that it caused the fire. Predictive Reasoning Note that prediction is not a statement about time, but about estimation of likelihood. Predictive reasoning is reasoning from new information about causes to new beliefs about effects, following the directions of the network arcs. 13-12

Intercausal Reasoning Intercausal reasoning involves reasoning about the mutual causes of a common effect Intercausal (Explaining Away) Explaining away Suppose that there are exactly two possible causes of a particular effect, represented by a v-structure in the BN. For example, a fire (F) could be caused an electrical problem (E) or a cigarette (C). Initially these two causes are independent. Suppose that we find evidence of a smoking. This new information explains the fire, which in turn lowers the probability that the fire was caused by an electrical problem. Even though the two causes are initially independent, with knowledge of one cause the alternative cause is explained away. The Parent nodes become dependent given information about the common effect. They are said to be conditionally dependent P(E F, C) P(E F) E C F 13-13

Pearl's Network Construction Algorithm In his 1988 textbook, Judea Pearl proposed the following algorithm for constructing a Bayesian Network. 1) Choose a set of relevant variables {X d } that describe the problem. 2) Choose an order for the variables [X 1, X 2,..., X D ] 3) For each variables X d from d=1 to D: a) Create a network Node for X d. b) determine the minimal set of previous nodes from 1 to d-1 on which X d depends. These are the Parents of X d : Parents(X d ). P(X d X d1,..., X dm ) = P(X d Parents(X d )) Such that {X d1,..., X dm } {X 1,..., X d-1 } c) Define the Conditional Probability Table (CPT) for X d Note that different node orders may result in a different network structures, with both representing the same joint probability distribution. The problem is to order the variables from Cause to Symptom so that the network representation is compact. (as few arcs as possible. ) 13-14