Introduction to Bayesian Networks

Similar documents
Icy Roads. Bayesian Networks. Icy Roads. Icy Roads. Icy Roads. Holmes and Watson in LA

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Introduction to Artificial Intelligence. Unit # 11

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Bayesian Networks. 22c:145 Artificial Intelligence. Bayesian Networks. Review of Probability Theory. Bayesian (Belief) Networks.

Quantifying uncertainty & Bayesian networks

Bayesian Networks and Decision Graphs

CS Lecture 3. More Bayesian Networks

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

PROBABILISTIC REASONING SYSTEMS

Artificial Intelligence Bayesian Networks

Directed Graphical Models

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Probabilistic Reasoning Systems

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. C P (D = t) f 0.9 t 0.

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

COMP5211 Lecture Note on Reasoning under Uncertainty

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

CS 343: Artificial Intelligence

Bayesian Networks BY: MOHAMAD ALSABBAGH

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Graphical Models and Kernel Methods

CPSC 540: Machine Learning

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Bayesian networks. Instructor: Vincent Conitzer

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Directed and Undirected Graphical Models

CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Rapid Introduction to Machine Learning/ Deep Learning

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Lecture 8: Bayesian Networks

Review: Bayesian learning and inference

Statistical Approaches to Learning and Discovery

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 188: Artificial Intelligence. Bayes Nets

Uncertainty and Bayesian Networks

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Graphical Models - Part I

Cognitive Systems 300: Probability and Causality (cont.)

Bayesian Learning. Bayesian Learning Criteria

Sum-Product Networks: A New Deep Architecture

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian networks. Chapter Chapter

Directed Graphical Models or Bayesian Networks

Artificial Intelligence Bayes Nets: Independence

Bayes Nets III: Inference

Bayesian Networks: Independencies and Inference

Introduction to Artificial Intelligence Belief networks

Intelligent Systems (AI-2)

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Probabilistic Graphical Models (I)

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian networks. Chapter 14, Sections 1 4

Probabilistic Graphical Models: Representation and Inference

Bayes Nets: Independence

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Reasoning Under Uncertainty: Belief Network Inference

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayesian Networks Representation

The Origin of Deep Learning. Lili Mou Jan, 2015

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.

Covariance. if X, Y are independent

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Introduction to Bayesian Learning

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Informatics 2D Reasoning and Agents Semester 2,

T Machine Learning: Basic Principles

Approximate Inference

Bayesian Networks. Motivation

The Monte Carlo Method: Bayesian Networks

Inference in Bayesian Networks

14 PROBABILISTIC REASONING

Lecture 5: Bayesian Network

Bayesian Networks (Part II)

Learning in Bayesian Networks

Bayesian Machine Learning

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayes Networks 6.872/HST.950

Y. Xiang, Inference with Uncertain Knowledge 1

Transcription:

Introduction to Bayesian Networks Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/23

Outline Basic Concepts Bayesian Networks Inference 2/23

Example 1: Icy Road 1 Police Inspector Smith is waiting for Mr Holmes and Dr Watson, who are late for their appointment Both of them are bad drivers Smith wonder if the road is icy as it is snowing Smith s secretary enters and tell him Watson has had a car accident Smith is afraid that Holmes has probably crashed too, as the road is icy The secretary says the road is salted and not icy at all Smith decides to wait 10 more minutes for Holmes rather than leaving for lunch 1 in An Introduction to Bayesian Networks by F. Jensen 3/23

The Uncertainty Reasoning H I W I: Icy road H: Holmes crashes W: Watson crashes The uncertainty on W influences the uncertainty on H When Smith is told that Watson has had a car accident, how does he do the reasoning, i.e., p(w H)? When the secretary tells him the road is not icy, how does he do the reasoning, i.e., p(w H, I)? 4/23

Conditional Independency x y 1 y 2 When x is not known, y 1 and y 2 are dependent p(y 1, y 2 ) p(y 1 )p(y 2 ) The uncertainties on y 1 and y 2 influence each other When x is given, y 1 and y 2 are independent This is called conditional independency p(y 1 x, y 2 ) = p(y 1 x) p(y 2 x, y 1 ) = p(y 2 x) p(y 1, y 2 x) = p(y 1 x)p(y 2 x) 5/23

Uncertainty Propagation and Blocking x x X y 1 y 2 y 1 y 2 uncertainty propagating blokcing at x The evidence on y 2 changes the uncertainty on y 2, i.e., p(y 2 ) This change is propagated through the network, i.e, p(x y 2 ) and p(y 1 y 2 ) Until it is blocked at a node that has evidence (i.e., observed). p(y 2 y 2, x) = p(y 2 x) 6/23

Example 2: Wet Grass 2 W R H S H: Holmes lawn is wet W: Watson lawn is wet R: rain S: sprinkler was on Holmes finds that his lawn is wet (H) It may be due to rain (R), or he forgot to turn off the sprinkler (S) What does he do in reasoning? He notices that his neighbor, Watson, also has a wet lawn (W) What does he do in reasoning? 2 in An Introduction to Bayesian Networks by F. Jensen 7/23

Conditional Dependency x 1 x 2 y When y is not known, x 1 and x 2 are independent p(x 1, x 2 ) = p(x 1 )p(x 2 ) When y is observed and given, x 1 and x 2 are no longer independent p(x 1, x 2 y) p(x 1 y)p(x 2 y) This is called conditional dependency 8/23

Propagation x 1 x 2 x 1 x 2 y 2 y 1 y 2 y 1 When y 1 is known, it will influence the rest of the network p(x 1 y 1 ), p(x 2 y 1 ), p(y 2 y 1 ) When y 1 is known, p(x 1 y 1 ) and p(x 2 y 1 ) are dependent When y 2 is known, it will influence x 1 and it turns influence x 2 9/23

Excise: Earthquake or Burglary 3 B A E R A: burglary alarm gone off B: burglary E: earthquake R: radio report Holmes is told that the burglary alarm in his house is gone off What does he do reasoning when he rushes back home? The radio reports a small earthquake How does it influence? 3 in An Introduction to Bayesian Networks by F. Jensen 10/23

Three Types of Connections E A B C E A B C E sequential connection B C E diverging connection E A converging connection Sequential connection. Evidence is transmitted unless the states of the connection node is known Diverging connection Evidence is transmitted unless the connection node is known Converging connection Evidence is transmitted if (1) the connection node is known, or (2) its descendants receive evidence 11/23

d-separation Two nodes are d-separated if pathes between them, there is an intermediate node, s.t., its connection is serial or diverging, and its is known its connection is converging, but it and its descendants receive no evidence i.e., evidence cannot be transmitted between the two nodes Let s examine the following example: E A B C E D F G H The neighbors of E are all known. Do E and F d-separated? 12/23

Markov Blanket The Markov blanket of a variable A includes the parent of A the children of A the variables that share a child of A If all variables in the Markov blanket of A are known, then A is d-separated from the rest of the network Let s see the example again E A B C E D F G H What is the Markov blanket of E? 13/23

Outline Basic Concepts Bayesian Networks Inference 14/23

Definition A Bayesian network consists of the following: A set of random variables (nodes), and a set of directed links representing the conditional probabilities A directed acyclic graph (DAG) Each node A with parents B 1,...,B n is associated with a conditional probability p(a B 1,...,B n ) 15/23

The Chain Rule The statistical property of a Bayesian network is completely characterized by the joint distribution of all the nodes Marginals are obtained by integrations and Bayesian rules The nice property of Bayesian net is the factorization of this large joint distribution Support the BN has X = {x 1,...,x n }, then p(x) = p(x 1,...,x n ) = where pa(x i ) is the parents of x i. This can be proved by induction n p(x i pa(x i )) i=1 16/23

Example 1 revisit H I W I: Icy road H: Holmes crashes W: Watson crashes The priors are P(I = y) = 0.7, P(I = n) = 0.3 The conditional distributions are p(h I) and p(w I) p(h I) I = y I = n H = y 0.8 0.1 H = n 0.2 0.9 p(w I) I = y I = n W = y 0.8 0.1 W = n 0.2 0.9 17/23

Example 1 revisit When W is observed, i.e., W = y or W = n, how does this evidence propagate? Let s do p(i W = y) and p(h W = y) When I is observed, i.e., I = y or I = n, how does this evidence propagate? 18/23

Example 2 revisit The priors The conditionals are p(r) = (0.2, 0.8), P(S) = (0.1, 0.9) p(w R) R = y R = n W = y 1 0.2 W = n 0 0.8 p(h R, S) R = y R = n S = y (1,0) (0.9,0.1) S = n (1,0) (0,1) 19/23

Outline Basic Concepts Bayesian Networks Inference 20/23

Inference in Bayesian Networks The nodes can be divided into two categories Xh which are unknown or hidden Xe which receive evidence and are known Inference is to estimate the hiddens X h based on the knowns, i.e., to find p(x h X e ) It is simply p(x h X e ) p(x) = p(x h,x e ) = p(x e X h )p(x h ) Once the BN is specified, the information is complete It is just a matter of marginalization The structure of BN makes this marginalization processes easier We call the following belief p(x i X e ), x i X h 21/23

Sum-Product Let s use Example 1 p(h W) p(i, H, W) = p(h I)p(W I)p(I) I H receives messages propagated from W through I W gives I p(i W) I combines it with p(i) and p(h I) (product) I is the one all messages are integrated (sum) I 22/23

Sum-Product Let s use Example 2 p(r W,H) p(w R)p(R) p(h R, S)p(S) S R receives three things (product) its prior p(r) message combined at W messages combined at S (d-connected) Another one p(s W,H) p(s) S receives two things (product) its local prior p(s) messages combined at R (sum) R p(w R)p(H R, S)p(R) 23/23