Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Similar documents
Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Introduction to Artificial Intelligence. Unit # 11

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Rapid Introduction to Machine Learning/ Deep Learning

Lecture 5: Bayesian Network

CS Lecture 3. More Bayesian Networks

Capturing Independence Graphically; Undirected Graphs

Module 10. Reasoning with Uncertainty - Probabilistic reasoning. Version 2 CSE IIT,Kharagpur

Reasoning with Bayesian Networks

Y. Xiang, Inference with Uncertain Knowledge 1

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Causal Inference & Reasoning with Causal Bayesian Networks

Capturing Independence Graphically; Undirected Graphs

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Naïve Bayes Classifiers

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian Networks for Causal Analysis

Bayesian Networks. Distinguished Prof. Dr. Panos M. Pardalos

Reasoning Under Uncertainty: Belief Network Inference

Building Bayesian Networks. Lecture3: Building BN p.1

Conditional Independence

Probabilistic Graphical Models (I)

Bayesian Networks and Decision Graphs

Confirmation and reduction: a Bayesian account

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Econ 325: Introduction to Empirical Economics

Directed Graphical Models

Fusion in simple models

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Equivalence in Non-Recursive Structural Equation Models

Introduction to Causal Calculus

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Our learning goals for Bayesian Nets

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Uncertainty and Bayesian Networks

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Learning in Bayesian Networks

Lecture 4 October 18th

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian Networks and the Problem of Unreliable Instruments

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Probability Calculus. p.1

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Semirings for Breakfast

Markov Independence (Continued)

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

arxiv:cs/ v1 [cs.ai] 8 Mar 2000

Learning Semi-Markovian Causal Models using Experiments

Markov properties for undirected graphs

Causality in Econometrics (3)

Separable and Transitive Graphoids

STA 4273H: Statistical Machine Learning

Bayesian Networks. Probability Theory and Probabilistic Reasoning. Emma Rollon and Javier Larrosa Q

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Review: Bayesian learning and inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Probabilistic Reasoning: Graphical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

A Tutorial on Bayesian Belief Networks

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

2 : Directed GMs: Bayesian Networks

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Basic Probabilistic Reasoning SEG

Informatics 2D Reasoning and Agents Semester 2,

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

COMP538: Introduction to Bayesian Networks

Representing Independence Models with Elementary Triplets

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Bayesian Networks. Motivation

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Data Mining 2018 Bayesian Networks (1)

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

An Introduction to Bayesian Networks: Representation and Approximate Inference

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

4.1 Notation and probability review

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Quantifying uncertainty & Bayesian networks

Probabilistic Representation and Reasoning

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Directed Graphical Models or Bayesian Networks

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Probability and (Bayesian) Data Analysis

1. True/False (40 points) total

Probability and Uncertainty. Bayesian Networks

Stephen Scott.

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Probabilistic Models

2 : Directed GMs: Bayesian Networks

Introduction to Probabilistic Graphical Models

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Transcription:

Bayesian Networks in Epistemology and Philosophy of Science Lecture 1: Bayesian Networks Center for Logic and Philosophy of Science Tilburg University, The Netherlands Formal Epistemology Course Northern Institute of Philosophy Aberdeen, June 2010 Bayesian Networks represent probability distributions over many variables X i. They encode information about conditional probabilistic independencies between X i. Bayesian Networks can be used to examine more complicated (=realistic) situations. This helps us to relax many of the idealizations that are usually made by philosophers. I introduce the theory of Bayesian Networks and present various applications to epistemology and philosophy of science. Organizational Issues Overview Procedure: Mix of lecture and exercises units. Literature: 1 Bovens, L. and S. Hartmann (2003): Bayesian Epistemology. Oxford: Oxford University Press (Ch. 3). 2 Dizadji-Bahmani, F., R. Frigg and S. Hartmann (2010): Confirmation and Reduction: A Bayesian Account. To appear in Erkenntnis. 3 Hartmann, S. and Meijs, W. (2010): Walter the Banker: The Conjunction Fallacy Reconsidered. To appear in Synthese. 4 Neapolitan, R. (2004): Learning Bayesian Networks. London: Prentice Hall (= the recommended textbook; Chs. 1 and 2). 5 Pearl, J. (1988): Probabilistic Reasoning in Intelligent Systems. San Francisco: Morgan Kaufmann (= the classic text). Lecture 1: Bayesian Networks 1 Probability Theory 2 Bayesian Networks 3 Partially Reliable Sources Lecture 2: Applications in Philosophy of Science 1 A Survey 2 Intertheoretic Reduction 3 Open Problems Lecture 3: Applications in Epistemology 1 A Survey 2 Bayesianism Meets the Psychology of Reasoning 3 Open Problems

Let S = {A, B,...} be a collection of sentences, and let P be a probability function. P satisfies the Kolmogorov Axioms: Kolmogorov Axioms 1 P(A) 0 2 P(A) = 1 if A true in all models 3 P(A B) = P(A) + P(B) if A, B mutually exclusive Some consequences: 1 P( A) = 1 P(A) 2 P(A B) = P(A) + P(B) P(A, B); (P(A, B) := P(A B)) 3 P(A) = n i=1 P(A B i) if B 1,..., B n are exhaustive and mutually exclusive ( Law of Total Probability ) Conditional Probabilities Definition: Conditional Probability Bayes Theorem: P(B A) = = P(A B) := with the likelihood ratio P(A, B) P(B) if P(B) 0 P(A B) P(B) P(A B) P(B) = P(A) P(A B) P(B) + P(A B) P( B) P(B) P(B) + P( B) x x := P(A B) P(A B) Conditional Independence Propositional Variables Definition: (Unconditional) Independence A and B are independent iff P(A, B) = P(A) P(B) P(A B) = P(A) P(B A) = P(B). Definition: Conditional Independence A is cond. independent of B given C iff P(A B, C) = P(A C). Example: A = yellow fingers, B = lung cancer, C = smoking A and B are positively correlated, i.e. learning that a person has A raises the probability of B. Yet, if we know C, A leaves the probability of B unchanged. C is called the common cause of A and B. We introduce two-valued propositional variables A, B,... (in italics). Their values are A and A (in roman script) etc. Conditional independence, denoted by A B C, is a relation between propositional variables (or sets of variables). A B C holds if P(A B, C) = P(A C) for all values of A, B and C. (See exercise 4) The relation A B C is symmetrical: A B C B A C Question: Which further conditions does the conditional independence relation satisfy?

Semi-Graphoid Axioms The conditional independence relation satisfies the following conditions: Semi-Graphoid Axioms 1 Symmetry: X Y Z Y X Z 2 Decomposition: X Y, W Z X Y Z 3 Weak Union: X Y, W Z X Y W, Z 4 Contraction: X Y Z & X W Y, Z X Y, W Z With these axioms, new conditional independencies can be obtained from known independencies. Joint and Marginal Probability To specify the joint probability of two binary propositional variables A and B, three probability values have to be specified. Example: P(A, B) =.2, P(A, B) =.1, and P( A, B) =.6 Note: A,B P(A, B) = 1 P( A, B) =.1 In general, 2 n 1 values have to be specified to specify the joint distribution over n variables. With the joint probability, we can calculate marginal probabilities. Definition: Marginal Probability P(A) = B P(A, B) Illustration: A: patient has lung cancer, B: X-ray test is reliable Joint and Marginal Probability (cont d) A Venn Diagram Representation The joint probability distribution contains everything we need to calculate all conditional and marginal probabilities involving the respective variables: Conditional Probability P(A 1,..., A m A m+1,..., A n ) = P(A 1,..., A n ) P(A m+1,..., A n ) P( A) P(A) Marginal Probability P(A m+1,..., A n ) = P(A 1,..., A m, A m+1,..., A n ) A 1,...,A m

Representing a Joint Probability Distribution P( A, B) P(A) P(A, B) P(B) Venn diagrams and the specification of all entries in P(A 1,..., A n ) are not the most efficient ways to represent a joint probability distribution. There is also a problem of computational complexity: Specifying the joint probability distribution over n variables requires the specification of 2 n 1 probability values. The trick: Use information about conditional independencies that hold between (sets of) variables. This will reduce the number of values that have to be specified. Bayesian Networks do just this... An Example from Medicine A Bayesian Network Representation Two variables: T : Patient has tuberculosis; X : Positive X-ray Given information: t := P(T) =.01 p := P(X T) =.95 = 1 P( X T) = 1 rate of false negatives q := P(X T) =.02 = rate of false positives Our task is to determine P(T X). Apply Bayes Theorem! P(T X ) = = P(X T) P(T) P(X T) P(T) + P(X T) P( T) p t p t + q (1 t) = t t + t x =.32 T Parlance: T causes X T directly influences X X with the likelihood ratio x := q/p and t := 1 t.

A More Complicated (= Realistic) Scenario Directed Acyclic Graphs Smoking Visit to Asia A directed graph G(V, E) consists of a finite set of nodes V and an irreflexive binary relation E on V. A directed acyclic graph (DAG) is a directed graph which does not contain cycles. Bronchitis Lung Cancer Tuberculosis Dyspnoea Pos. X-Ray Some Vocabulary The Parental Markov Condition Parents of node A: par(a) Ancestor Child node Descendents Non-Descendents Root node A B C D E F G H Definition: The Parental Markov Condition (PMC) A variable is conditionally independent of its non-descendents given its parents. Standard example: The common cause situation. Definition: Bayesian Network A Bayesian Network is a DAG with a probability distribution which respects the PMC.

Three Examples Bayesian Networks at Work B C A A C A B C A B A B C chain collider common cause B A C B How can one calculate probabilities with a Bayesian Network? The Product Rule P(A 1,..., A n ) = n P(A i par(a i )) i=1 Proof idea: Starts with a suitable anchestral ordering, then apply the Chain Rule and then the PMC (cf. exercises 3 & 6). I.e. the joint probability distribution is determined by the product of the prior probabilities of all root nodes (par(a) = ) and the conditional probabilities of all other nodes, given their parents. This requires the specification of no more than than n 2 mmax values (m max is the maximal number of parents). An Example Some More Theory: d-separation H REP R P(H REP) = P(H) = h, P(R) = r P(REP H, R) = 1, P(REP H, R) = 0 P(REP H, R) = a, P(REP H, R) = a P(H, REP) P(REP) = R P(H, R, REP) P(H, R, REP) H,R = P(H) R P(R) P(REP H, R) P(H) P(R) P(REP H, R) = H,R h(r + a r) hr + a r We have already seen that there are more independencies in a Bayesian Network than the ones accounted for by the PMC. Is there a systematic way to find all independences that hold in a given Bayesian Network? Yes! d-separation Let A, B, and C be sets of variables. Then the following theorem holds: Theorem: d-separation and Independence A B C iff C d-separates A from B. So what is d-separation?

Example 1 Example 2 R 1 H R 2 A B C PMC C A B But is it also the case that A C B? This does not follow from PMC: PMC A C B A C B can, however, be derived from C A B and the Symmetry Axiom for Semi-Graphoids. REP 1 REP 2 PMC REP 1 REP 2 H, R 1 (*) But: PMC REP 1 REP 2 H However: PMC R 1 H, REP 2 Weak Union R 1 REP 2 H (**) (*), (**), Symmetry & Contraction R 1, REP 1 REP 2 H Decomposition & Symmetry REP 1 REP 2 H d-separation How to Construct a Bayesian Network Definition: d Separation A path p is d-separated (or blocked) by (a set) Z iff there is a node w p satisfying either: 1 w has converging arrows (u w v) and none of w or its descendents are in Z. 2 w does not have converging arrows and w Z. Theorem: d Separation and Independence (again) If Z blocks every path from X to Y, then Z d-separates X from Y and X Y Z. 1 Specify all relevant variables. 2 Specify all conditional independences which hold between them. 3 Construct a Bayesian Network which exhibits these conditional independencies. 4 Check other (perhaps unwanted) independencies with the d-separation criterion. Modify the networks if necessary. 5 Specify the prior probabilities of all root nodes and the conditional probabilities of all other nodes, given their parents. 6 Calculate the (marginal or conditional) probabilities you are interested in using the Product Rule.

Plan The proof of the pudding is in its eating! Guiding question: When we receive information from independent and partially reliable sources, what is our degree of confidence that this information is true? Independence? Partial reliability? A. Independence B. Partial Reliability Assume that there are n facts (represented by propositional variables F i ) and there are n corresponding reports (represented by propositional variables REP i ) by partially reliable witnesses (testimonies, scientific instruments, etc.). Assume that, given the corresponding fact, a report is independent of all other reports and of all other facts. They do not matter for the report. I.e., we assume that To model partially reliable information sources, additional model assumptions have to be made. Examine two models! Independent Reports REP i F 1, REP 1,... F i 1, REP i 1, F i+1, REP i+1,... F n, REP n F i for all i = 1,..., n.

Model I: Fixed Reliability Model I: Fixed Reliability (cont d) Paradigm: Medical Testing F 1 F 3 F 1 F 3 F 2 F 2 REP 1 REP 2 REP 3 REP 1 REP 2 REP 3 P(REP i F i ) = p P(REP i F i ) = q < p Measuring Reliability We assume positive reports. In the network, we specify two parameters that characterize the reliability of the sources, i.e. p := P(REP i F i ) and q := P(REP i F i ). Model II: Variable Reliability, Fixed Random Parameter Paradigm: Scientific Instruments F 1 F 3 Definition: Reliability r := 1 q/p with p > q (confirmatory reports) This definition makes sense: 1 If q = 0, then the source is maximally reliable. 2 If p = q, then the facts do not matter for the report and the source is maximally unreliable. Note that any other normalized negative function of q/p also works and the results that obtain do not depend on this choice. F 2 R 1 R 2 R 3 REP 1 REP 2 REP 3

Model II: Variable Reliability, Fixed Random. Parameter Model IIa: Testing One Hypothesis F 1 F 3 H F 2 R 1 R 2 R 3 P(REP i F i, R i ) = 1 P(REP i F i, R i ) = 0 P(REP i F i, R i ) = a P(REP i F i, R i ) = a R 1 R 2 R 3 REP 1 REP 2 REP 3 REP 1 REP 2 REP 3 P(REP i H, R i ) = 1, P(REP i H, R i ) = 0, P(REP i H, R i ) = a, P(REP i H, R i ) = a Outlook Outlook 1 The Parental Markov Condition is part of the definition of a Bayesian Network. 2 The d-separation criterion helps us to identify all conditional independences in a Bayesian Network. 3 We constructed two basic models of partially reliable information sources: (i) Endogenous reliability (paradigm: medical testing) (ii) Exogenous reliability (paradigm: scientific instruments) 4 In the following two lectures, we will examine applications of Bayesian Networks in philosophy of science (lecture 2) and epistemology (lecture 3).